Overview
Teaching: 30 min Exercises: 10 minQuestions
What is reproducibility?
Why should I take reproducibility in mind for my research?
How can I make my research reproducible?
Objectives
understand what it means to have a reproducible research
understand the steps to create a reproducible research
In a single sentence, reproducibility is the ability to exactly re-create an earlier research/analysis given the same data. Meaning that if I hand of a piece of journal that describe my method for a research, and the input data, another person can make the exact findings.
The image above is the reproducibility spectrum showing the range from research that are described in publications, all the way through full replication, which are research with publication, executable code, and linked data, like having a frozen machine on the cloud, that can be executed to run your whole research. We don’t want both extremes, but rather in the middle depending on your field, data, and your research.
Reproducibility Discussion
What measures do you take to ensure your analyses are:
- reproducible,
- replicable,
- robust?
https://etherpad.wikimedia.org/p/ghw2018-reproducible-discussion
PS: shameless copied from the awesome slides available at: https://github.com/oceanhackweek/ohw2018_tutorials/blob/master/day5/reproducible_research_and_tools/.
For this tutorial I’m going to cover test, document, and publish your code part.
In research, experiments/results are not trusted unless:
So why would scientific software be any different?
The code test-document-publish cookie cutter!
(Yep! Another cookie cutter for Scientific Python package!)
Example: https://nsls-ii.github.io/scientific-python-cookiecutter
Hack session
- create python package
- choose a license (https://choosealicense.com/)
- write doctest
- bug? fix test / re-run
- setup Travis-CI / CircleCI
- setup AppVeyor
- upload source dist and docs
- create doi
Key Points
reproducibility is the ability to exactly re-create an earlier research/analysis given the same data
the less time it takes to get the same result you did with the resources provided, the more “reproducible” your research/analysis are
in research, experiments/results are not trusted unless
- The experimental setup is tested
- The method is well-documented
- We can demonstrate that our results are reproducible and reliable
simple steps to reproducible research are
- record the project’s provenance
- data and metadata curation
- establish a testing/analysis workflow
- test, document, and publish your code
- share it!