Questions tagged [reproducible-research]

Reproducible research is the idea that the result of scientific research should be published with data and code in order to make it possible for other researchers to verify the results.

Reproducible research is the idea that the result of scientific research should be published with data and code in order to make it possible for other researchers to verify the results.

Reproducible research may be especially important to you if your investigation involves large amount of data or very complex calculations.

One possible set of tools for reproducible research is using with or .

Related links:

227 questions
105
votes
10 answers

Unit tests for functions in a Jupyter notebook?

I have a Jupyter notebook that I plan to run repeatedly. It has functions in it, the structure of the code is this: def construct_url(data): ... return url def scrape_url(url): ... # fetch url, extract data return parsed_data for i…
Richard
  • 62,943
  • 126
  • 334
  • 542
48
votes
3 answers

Fully reproducible parallel models using caret

When I run 2 random forests in caret, I get the exact same results if I set a random seed: library(caret) library(doParallel) set.seed(42) myControl <- trainControl(method='cv', index=createFolds(iris$Species)) set.seed(42) model1 <-…
Zach
  • 29,791
  • 35
  • 142
  • 201
35
votes
1 answer

Example of using dput()

Being a new user here, my questions are not being fully answered due to not being reproducible. I read the thread relating to producing reproducible code but to avail. Specifically lost on how to use the dput() function. Could someone provide a step…
Tyler
  • 543
  • 4
  • 13
31
votes
10 answers

programmatically add cells to an ipython notebook for report generation

I have seen a few of the talks by iPython developers about how to convert an ipython notebook to a blog post, a pdf, or even to an entire book(~min 43). The PDF-to-X converter interprets the iPython cells which are written in markdown or code and…
zach
  • 29,475
  • 16
  • 67
  • 88
25
votes
9 answers

Reproducible results in Tensorflow with tf.set_random_seed

I am trying to generate N sets of independent random numbers. I have a simple code that shows the problem for 3 sets of 10 random numbers. I notice that even though I use the tf.set_random_seed to set the seed, the results of different runs do not…
Mehdi Rezaie
  • 306
  • 1
  • 3
  • 7
23
votes
6 answers

Set working directory in Python / Spyder so that it's reproducible

Coming from R, using setwd to change the directory is a big no-no against reproducibility because others do not have the same directory structure as mine. Hence, it's recommended to use relative path from the location of the script. IDEs slightly…
Heisenberg
  • 8,386
  • 12
  • 53
  • 102
14
votes
1 answer

How to save and load random number generator state in Pytorch?

I am training a DL model in Pytorch, and want to train my model in a deterministic way. As written in this official guide, I set random seeds like this: np.random.seed(0) torch.manual_seed(0) torch.backends.cudnn.deterministic =…
hajduistvan
  • 143
  • 1
  • 5
13
votes
2 answers

knitr - error when importing python module

I am having trouble when running the python engine in knitr. I can import some modules but not others. For example I can import numpy but not pandas. {r, engine='python'} import pandas I get the error. Quitting from lines 50-51 (prepayment.Rmd)…
Glen Thompson
  • 9,071
  • 4
  • 54
  • 50
13
votes
11 answers

List of loaded/imported packages in Julia

How can I get a list of imported/used packages of a Julia session? Pkg.status() list all installed packages. I'm interested in the ones that that were imported/loaded via using ... or import ... It seems that whos() contains the relevant…
Julian
  • 1,271
  • 2
  • 12
  • 17
12
votes
1 answer

How can one use Binder (mybinder.org) with private Github repositories?

After reviewing this exact issue (https://github.com/jupyterhub/binderhub/issues/237) it seems that the functionality for this has been implemented with this merged pull request (https://github.com/jupyterhub/binderhub/pull/671). However I can not…
12
votes
1 answer

An Overview of Nix/OS Architecture?

While the Nix/OS wiki and manuals provide a lot of excellent information, I am still having trouble getting an architectural overview. Apologies for the quantity and naivity of the questions; feel free to answer a subset: 1. What constitutes a Nix…
Ixxie
  • 1,393
  • 1
  • 9
  • 17
11
votes
0 answers

Better reproductibility of rPackages (pin version of packages) in nix in comparison to guix

I'm actually evaluate different solution to enhance/explore reproductibility in my R/Python scientific workflow : data with reproductible analysis (plot, analysis) and paper. There is, as you know, two big linux flavours offer some solutions : Nix…
reyman64
  • 523
  • 4
  • 34
  • 73
11
votes
1 answer

Why are my results still not reproducible?

I want to get reproducible results for a CNN. I use Keras and Google Colab with GPU. In addition to recommendations to insert certain code snippets, which should allow a reproducibility, I also added seeds to the layers. ###### This is the first…
11
votes
2 answers

Using BERT for next sentence prediction

Google's BERT is pretrained on next sentence prediction tasks, but I'm wondering if it's possible to call the next sentence prediction function on new data. The idea is: given sentence A and given sentence B, I want a probabilistic label for…
Paul
  • 121
  • 1
  • 1
  • 4
11
votes
2 answers

Use rmarkdown/knitr to hold all code until the end

I'd like to be able to generate a document using knitr/rmarkdown that keeps all the output together, but leaves the code until the end, ideally as a referenced footnote of sorts (i.e. the code for each figure or output can be looked up in the…
micturalgia
  • 325
  • 1
  • 4
  • 13
1
2 3
15 16