Final project

In weeks 1-6 you learned how to set up a computational analysis (using RStudio, git, and GitHub) and techniques for analyzing data (creating and accessing data structures, writing functions, and using the tidyverse). In weeks 7-9, you’ll apply those skills to replicate a key finding (e.g., a figure or table) from a published paper.

As science generally becomes more computational, including eco/evo biology, the traditional methods section has become less capable of capturing the complexities of the scientific process. Documenting complex data and code in a manuscript is difficult, and as a result many papers do not adequately document how to replicate their results. Replication studies, such as (Boersch-Supan 2021) in ReScience C, demonstrate the challenges of following the computational methods in published papers. Replication assignments are also effective teaching tools for computational methods and open science practices (Marwick et al. 2019).

You’ll work with a group to write your own replication study following these steps:

Choose a study for replication
Identify the necessary (meta)data source(s)
Create a computational analysis to replicate a key finding
Write a (brief) manuscript documenting your process

Choose a study for replication

As a group, decide on a published paper for your final project. Identify the key finding you will replicate. The principal requirement for this paper is it must have a data availability statement indicating an open data repository where you can find the data.

Submission

Designate one member of your group to be your repository maintainer. Create a GitHub repository for your project. Add a document to the repository’s website ([your username].github.io/[repository name]) describing the paper and finding you plan to replicate.

Create an issue in the repo, add link to the new document, list the group members’ names, and tag me (FlukeAndFeather).

Identify the necessary (meta)data source(s)

Find the data you’ll need for your replication, as well as the metadata describing how to reuse those data. Determine whether the (meta)data are sufficient to replicate your key finding.

Submission

Create another document for the repository’s website describing the data and metadata, and how they support the key finding.

Create another issue in the repo, link to the new document, and tag me (FlukeAndFeather).

Create a computational analysis to replicate a key finding

This is the meat of the final project. Write an analysis using the skills you learned during weeks 1-6 to replicate your key finding.

Make a plan

Before you write any code, make a written plan. What will your folder/file structure look like? Will you need to preprocess the data? What intermediate output data will you produce? What packages and functions will you need to use?

Submission

Put your analysis plan into a document. A flowchart or some other kind of sketch will probably be useful. Add the document to your repository’s website.

Create another issue in the repo, link to the new document, and tag me (FlukeAndFeather).

Execute the plan

Working as a team, write your analysis. You will undoubtedly have to deviate from your plan as you learn more about the data and methods - that’s ok!

Submission

Create a document detailing the intermediate steps and final results of your analysis. Add the document to your repository’s website.

Create another issue in the repo, link to the new document, and tag me (FlukeAndFeather).

Write a (brief) manuscript documenting your process

Create a document called paper.qmd in your reports/ directory for your final manuscript. This manuscript should contain the following.

A brief (~1 paragraph) introduction describing the paper and finding you’re replicating.
A summary of the data associated with the finding. This should include a bare minimum description of field/laboratory methods (common garden experiment? line transects?) as well as the data’s representation (if it’s a CSV file, what do the columns and rows represent?)
A summary of the analysis methods. How did you process the data? If you fit a statistical model to the data, how did you do that?
A description of your results. If you replicated a figure or table, add it here.
A brief (~2-3 paragraphs) discussion. Compare your results to the original paper. Did you find the same result or were they different? If the original paper’s methods left a step out (e.g., a parameter value), how did you choose what to use?

General guidelines:

Use Quarto’s technical writing features
- Cross-reference figures and tables using code chunk labels. Don’t write (Figure 1) directly - let Quarto number them for you.
- Cite references using a .bib file. The tutorial has good instructions for how to do this.
Save file changes and commit/push early and often!
Use the scratch/ folder judiciously. As you start exploring the data/analysis, creating an R script in the scratch/ is a good way to start. But as you understand the data/analysis better, make sure to move code out of scratch/ into functions in R/ or standalone scripts (e.g., 00_preprocess_data.R).

Submission

Add paper.qmd to your repository’s website. Create your last issue in the repo, link to the paper, and tag me (FlukeAndFeather).

References

Boersch-Supan, Philipp H. 2021. “[Re] Modeling Insect Phenology Using Ordinal Regression and Continuation Ratio Models,” June. https://doi.org/10.5281/ZENODO.5006005.

Marwick, Ben, Li-Ying Wang, Ryan Robinson, and Hope Loiselle. 2019. “How to Use Replication Assignments for Teaching Integrity in Empirical Archaeology.” Advances in Archaeological Practice 8 (1): 78–86. https://doi.org/10.1017/aap.2019.38.