Week 1 - Computational project organization
Student learning objectives
Students will be able to:
Describe the relationship between RStudio project, git repo, and GitHub repo
Organize files and folders to maximize reproducibility and collaboration
Create a project website on GitHub
Today’s lesson
tl;dr Each of your analyses should have a standalone directory with all the code and data necessary to produce your results.
This lesson has readings, an exercise, and an assessment. Feel free to jump around and do them in whatever order works best for you. I suggest the following order:
- Read Bryan (2017) about project-oriented workflows
- If you’re new to git, read Bryan (2018) about version control and scientific analysis. Consider yourself familiar with git if you know what commit, push, and pull do. If that’s the case, read Braga et al. (2023) about using GitHub for managing scientific projects.
- Complete the Exercise
- Complete the Assessment
But first, watch this Tik Tok
Readings
The readings will help clarify why the skills you’re learning will improve your productivity and make your science easier to share, reproduce, and collaborate on.
Project workflows
Optionally: chapter 3 of (British Ecological Society and Cooper 2018) Organising projects for reproducibility
Version Control
(Bryan 2018) if you’re new to version control
(Braga et al. 2023) if you already have some experience
Exercise
In this exercise you’re going to set up an analysis to take advantage of project organization tools in RStudio, git, and GitHub. We’ll use this framework for exercises and assessments throughout the rest of the course.
Goals
- Create an RStudio project and git repository on your computer
- Create a corresponding GitHub repository online
- Set up an analysis-friendly folder structure
- Activate GitHub pages to give your project a website
Step 1: Create an RStudio project and git repository
Open RStudio. From the top menu, click on File > New Project to launch the New Project Wizard.
Choose New Directory > New Project. You should now see the Create New Project prompt.

The Directory name is the name of your project. Call it “bioe215lesson1”. When it comes to project directory naming, there are a couple of best practices to follow. Stick to letters and numbers, and avoid special characters like spaces, underscores, and dashes. This ensures your directory name will be compatible with the naming requirements for R, git, and GitHub.
Next you’ll choose where to put your project on your computer. In the screenshot above, Create project as subdirectory of: is set to /Users/frank/Documents/GitHub, which is where I keep projects on my computer. Yours may default to somewhere unhelpful. I suggest creating a subfolder in your “Documents” directory called “GitHub”, then use the “Browse…” button to navigate there.
Make sure the check box for Create a git repository is checked.
Click on Create Project.

Once RStudio finishes creating your project, you should see something like the screenshot above. Make sure you see both the git pane and the project name. Let the instructor know when you’re done with this step so they can check everything looks right.
Step 2: Create a GitHub repository
RStudio projects and git repos both live locally on your computer. GitHub repos are remote repositories on the internet. Now you’re going to create a remote GitHub repository.
First, install the usethis package, which has helpful functions for project organization and management. At the R console, run install.packages("usethis").
If you don’t have a GitHub account yet, now’s the time to create one. Go to github.com and create an account.
Make sure git is configured on your machine to have the correct user name and email. In the Terminal pane (not the Console pane)1, run git config --list. Your user.name should be your name and user.email should be the same email you used with your GitHub account. If they’re not configured correctly, run the following R command in the Console. Change YOURNAME and YOUREMAIL accordingly. This is a one-time setup command, you won’t have to do it again until you replace your computer.
usethis::use_git_config(user.name = "YOURNAME", user.email = "YOUREMAIL")[Is the double colon :: unfamiliar? This operator tells R to look for a function in a certain package. So this command uses the use_git_config() function from the usethis package. Alternatively, you can call library(usethis) and then call the function directly.]
The last step before you can connect RStudio to GitHub is saving your credentials. The safest way to do that is with Personal Access Tokens, or PATs. These tell GitHub you are who you say you are and prevent anyone else from messing with your remote repositories2. If you know you already have a PAT, continue to the next step. This is easy to do with usethis. Call usethis::create_github_token(), which will launch GitHub in your web browser. Give your token the name “RSTUDIO” and change the Expiration from 30 days to 90 days. Scroll all the way to the bottom and click the green Generate token button.

You should see a long line of text with a green check mark next to it. That’s your token. Copy it, then go back to RStudio. At the Console, run gitcreds::gitcreds_set(). You’ll get a prompt to ? Enter new password or token:. Paste your token and click enter. Now RStudio and GitHub can talk to each other3.
All that’s left to do now is creating your GitHub repo. usethis helps automate this step. First, call usethis::use_git() and commit the uncommitted files. Then call usethis::git_default_branch_rename()4. Finally, call usethis::use_github(). You should now see your GitHub repository in your browser. Once again, let the instructor know when you’re done with this step so they can check everything looks right.
Step 3: Set up folder structure
There are a lot of ways you can organize a computational project. It’s less important which system you use than it is to use a system consistently. Consistency reduces headaches! In this exercise, you’re going to use a modified version of the system described by Annna Krystalli in the Organising projects for reproducibility chapter of British Ecological Society and Cooper (2018) (some of the following comes from there verbatim). Create the following folders in your project.
The
datafolder contains all input data (and metadata) used in the analysis.The
paperfolder contains the manuscript.The
figsdirectory contains figures generated by the analysis.The
outputfolder contains any type of intermediate or output files (e.g. simulation outputs, models, processed datasets, etc.). You might separate this and also have a cleaned-data folder.The
Rdirectory contains R scripts with function definitions.The
reportsfolder contains Quarto documents5 that describe the analysis or report on results.The
docsfolder contains the rendered versions of the reports.The
scratchfolder contains early prototypes and other code I don’t fully understand yet.The scripts that actually do things are stored in the root directory, but if your project has many scripts, you might want to organize them in a directory of their own.
Here’s an example of what your project could look like when you’re done. Notice the numbered scripts: these run the steps of your analysis in order. The rest of the folders hold the components of your analysis so they’re easy to find.

The biggest benefit of adopting this system is the cognitive space it frees up in your brain. Any brainpower you were devoting to figuring out where to put a file or where to find something can now be reallocated to your actual science.
A quick note about data. If your raw data are small (<100 MB) and you have permission to make them public6, then it’s ok to store them on GitHub. If they’re big or private, you’ll need to keep them off GitHub. That’s what the .gitignore file is for. If necessary, call usethis::use_git_ignore("data/*"), which will add your data/ directory to .gitignore and keep it out of GitHub.
Step 4: Activate GitHub pages
GitHub has an option for creating a website out of your repository. This is an incredible feature for working with collaborators! You can put your methods and results in Quarto documents that GitHub serves for you. Compared to email threads, a project website does a much better job keeping co-authors in the loop and getting new collaborators up to speed. The folder organization system you created in Step 3 makes this pretty simple.
Let’s start with create a simple README. GitHub will turn everything in your docs/ directory into the project website. By default, anything called “index” will be your landing page. Create a text file called docs/index.md. The “.md” suffix stands for Markdown. You’ll learn more about Markdown next week. For now, it’s enough to know Markdown is a text-only format that allows basic formatting. When you open index.md, you’ll see RStudio’s visual editor. At the top of the editor, switch from Visual to Source. Add the following text.
# README
This is my project website for the Computational Project Organization lesson of *Data Science for Eco/Evo*.
The reading assessment answers are [here](assessment.md).
My project organization notes are [here](proj_org_notes.md).Your editor should look like this.

Here’s what the special characters do:
#creates a header*text*puts text in italics[text](url)creates a link
You’ll create assessment.md and proj_org_notes.md in the assessment.
Let’s see it in action. First, activate Pages on GitHub. go to your GitHub repo in your browser. Click on Settings and choose Pages under Code and automation. Under Source it should say Deploy from a branch. Under Branch, change None to main and the directory from / (root) to /docs. Click Save.
Now you need to give GitHub something to deploy. Go back to RStudio and commit all your new and changed files. To do this, go to the Git pane. You’ll see a list of new and modified files. Check the boxes next to all of them to stage the files. This tells git you’d like to commit them. Click on the Commit button to launch the Review Changes dialog and add a commit message. As a rule of thumb, keep commit messages short (<50 characters). Click the Commit button, then click the Push button.
It will take GitHub a minute or two to render your site. Return to your GitHub repo in your browser and switch to the Actions tab. You’ll see a job running called pages build and deployment with a yellow dot next to it. Click on the job and wait for the steps to all turn green.

Click on the link under deploy. Notice the url, it’s [yourgithubname].github.io/[yourreponame]. That’s the pattern GitHub pages uses. You should see your README rendered in all its glory. Call the instructor over for debugging if necessary followed by a high five.
Assessment
Reading questions
Answer the following questions about the reading. Create a markdown file called docs/reading.md. Put your answers to the following questions in that file.
Project workflows
Bryan (2017)
What problems can
setwd()cause in your scripts and how do RStudio projects address them?When you call
rm(list=ls()), what is removed from your environment? What’s left over that restarting your R session would remove? What’s the keyboard shortcut for restarting your R session?
Version control
You either read Bryan (2018) or Braga et al. (2023). Answer the questions for the paper you read.
Bryan (2018)
- The basic git commands are commit, push, and pull. Which commands change happen locally (i.e., on your computer)? Which happen remotely?
- Why do diffs work for source code (e.g., .R files) but not Word documents (i.e., .docx files)?
- Why is Markdown useful for GitHub repos?
Braga et al. (2023)
- Imagine you’re working with a few collaborators on an analysis. Come up with two examples of Issues you might open. How would using Issues differ from communicating over email?
- What are three ways GitHub features can promote open science practices?
Project organization notes
You were (probably) exposed to a lot of new material in this lesson. In this part of the assessment, you’ll create a cheat sheet for yourself. Create a markdown file called docs/proj_org_notes.md. In that file, create a short document you can use to guide you when you create your next analysis project. Consider which steps only need to be configured once versus which have to be repeated for every project. If there were steps you found counterintuitive or confusing today, make a note of why you did them or how you figured them out.
Submission
Submit your assessment by emailing your GitHub pages url to mczapans [at] ucsc [dot] edu7.
References
Footnotes
If you can’t find the Terminal pane, ask your instructor for help. Sometimes it needs to be activated the first time you use it.↩︎
The internet is full of vandals. C’est la vie.↩︎
PATs are kind of like your git configuration in that they’re “set it and forget it” commands. You will only need to update your PAT every three months if you set Expiration to 90 days. The rest of the time you don’t need to think about it. And if you’re wondering why we set PATs to expire at all, see the previous footnote about vandals.↩︎
We won’t cover branches in this course, but they’re an important part of version control. Suffice to say, every repo has a main branch which by default is called “master”. That’s racist terminology, so there’s a concerted effort to use “main” instead.↩︎
“Quarto, what’s that?” you might say. Quarto documents combine text, code, figures, and tables. They’re extremely useful for scientific analyses and writing! You’ll learn more about them next week. If you’re familiar with RMarkdown, Quarto is its next evolutionary step.↩︎
If you hold the rights to your data and they’re safe to put online (i.e., nothing sensitive), I strongly encourage you to make them public from the start. You might worry about getting scooped, which is understandable. But in my experience publishing datasets, it’s incredibly difficult to get other scientists to look at your data when you’re literally advertising them. In my opinion, the advantages of making data public far outweigh the risks. It facilitates collaboration and makes it easier to publish your data when you wrap up your project.↩︎
TBH I haven’t figured out yet how to access UCSC’s Canvas as an instructor. You’ll submit future assessments through Canvas once I figure that out.↩︎