This is an old version of this page. You can view the most recent version or browse the history.

Results reproducibility

Introduction

Results reproducibility is a very important issue in science. It has been already noted that in many cases reproducing your own results even after a few months (typical time scale of referee process) may be challenging. It is because in most cases it is not sufficient to have the same version of the code, but you also need precise knowledge about input parameters that were used. Since the standard methodology in science is based on try and fail methodology, typically at the end we end up with many datasets, and only a few of them is released to publication finally, while others serve as experimental runs. Then, tracking of settings that were used for various runs becomes a problem. W-SLDA implements an automatic framework that allows for results reproduction. Namely, the generated results are always accompanied by the reproducibility pack:

W-SLDA mechanism of results reproducibility

Developers of W-SLDA Toolkit recognize the need for intrinsically implemented support that will simplify the process of reproducing of the results. To comply with this requirement following mechanism has been implemented (called as reproducibility pack):

Each file generated by W-SLDA Toolkit in the header provides basic info about the code version that has been used, for example, the header of wlog file may look like:

# CREATION TIME OF THE LOG: Sun Feb  7 15:29:44 2021
# EXECUTION COMMAND       : ./st-wslda-2d input.txt
# CODE NAME               : "W-SLDA-TOOLKIT"
# VERSION OF THE CODE     : 2021.01.27
# COMPILATION DATE & TIME : Feb  7 2021, 15:19:57

When executing the code, all user-definable files are recreated and attached to the output-set. For example, if the user set outprefix as test, then among output files there will be:

test_input.txt             # input file used for calculations
test_predefines.h          # predefines selected at compilation stage
test_problem-definition.h  # user's definition of the problem 
test_logger.h              # user's logger
test.stdout                # standard output generated by the code
test_checkpoint.dat.init   # checkpoint file that was used as input (st codes only)
test_extra_data.dat        # Binary file with the extra_data array (if provided)
test_reprowf.tar           # reproducibility pack for restoring wave-functions that were used as input (td codes only)

This provides the full information required to reproduce your results (up to machine precision).

Good practices

For each project use a separate folder, do not mix results from various projects in the same folder. Use a meaningful name for folder.
Use meaningful outprefix names.
Do not modify output files, except wtxt file. This one is designed to store various metadata information, including your comments. wtxt file is easy to reproduce in case if you destroy it accidentally, which is not the case with other files. Add your comments/remarks/etc in form of comments starting with #.
When copying results to a new location/machine, copy all files assisted with the run. The simplest way is to execute the command (for more info see here):

scp outprefix* new_location

When printing messages to stdout use functions:

// prints to stdout and to file outprefix.stdout
void wprintf( const char * format, ... );       

// prints to stream (like stdout or stderr) and to file outprefix.stdout         
void wfprintf(FILE *stream,  const char * format, ... );

These are analogs of printf and fprintf with the difference that the message will be added also to outprefix.stdout.

To learn more about good practices related to results reproducibility issue see:

Creating Reproducible Data Science Projects