Results reproducibility

Introduction
W-SLDA mechanism of results reproducibility
Good practices
How to reproduce the results
List of reproducibility packs
Known issues with existing reproducibility packs
- New J. Phys. 25, 033013 (2023)

Introduction

Results reproducibility is a critical issue in science. It has already been noted that in many cases, reproducing your results even after a few months (the typical time scale of the referee process) may be challenging. In most cases, having the same code version is insufficient, but you also need precise knowledge about the input parameters used, and the same input data must be provided. Since the standard methodology in science is based on try-and-fail methods, typically, the researcher ends up with many datasets. Only a few are released for publication, while others serve as experimental runs. Under such conditions, tracking changes introduced to codes in the research process becomes problematic. The W-SLDA Toolkit implements a methodology that does it automatically and allows for reproducing the results (up to machine precision). Namely, the generated results are always accompanied by the reproducibility pack, where complete information needed to reproduce them is included.

For the meaning of each file, see here.

W-SLDA mechanism of results reproducibility

Developers of the W-SLDA Toolkit recognize the need for intrinsically implemented support that will simplify the process of reproducing the results. To comply with this requirement, the following mechanism has been implemented (called a reproducibility pack):

Each file generated by the W-SLDA Toolkit in the header provides basic info about the code version that has been used; for example, the header of wlog file may look like:

# CREATION TIME OF THE LOG: Sun Feb  7 15:29:44 2021
# EXECUTION COMMAND       : ./st-wslda-2d input.txt
# CODE NAME               : "W-SLDA-TOOLKIT"
# VERSION OF THE CODE     : 2021.01.27
# COMPILATION DATE & TIME : Feb  7 2021, 15:19:57

When executing the code, all user-definable files are recreated and attached to the output-set. For example, if the user sets outprefix as test, then among output files, there will be:

test_input.txt             # input file used for calculations
test_predefines.h          # predefines selected at compilation stage
test_problem-definition.h  # user's definition of the problem 
test_logger.h              # user's logger
test_machine.h             # machine configuration used in the calculations
test.stdout                # standard output generated by the code
test_checkpoint.dat.init   # checkpoint file that was used as input (st codes only)
test_extra_data.dat        # Binary file with the extra_data array (if provided)
test_reprowf.tar           # reproducibility pack for restoring wave functions that were used as input (td codes only)

This provides the complete information required to reproduce your results (up to machine precision).

Good practices

For each project, use a separate folder; do not mix results from various projects in the same folder. Use a meaningful name for the folder.
Use meaningful outprefix names.
Do not modify output files, except wtxt file. This one is designed to store various metadata information, including your comments. The wtxt file is easy to reproduce if you destroy it accidentally, which is not the case with other files. Add your comments/remarks/etc in the form of comments starting with #.
When copying results to a new location/machine, copy all files assisted with the run. The simplest way is to execute the command (for more info, see here):

scp outprefix* new_location

When printing messages to stdout use functions:

// prints to stdout and to file outprefix.stdout
void wprintf( const char * format, ... );       

// prints to stream (like stdout or stderr) and to file outprefix.stdout         
void wfprintf(FILE *stream,  const char * format, ... );

These are analogs of printf and fprintf with the difference that the message will also be added to outprefix.stdout.

To learn more about good practices related to results reproducibility issues, see:

Build a Reproducible and Maintainable Data Science Project

How to reproduce the results

Get the code: Check one of the provided *.h files to determine the version of code that should be used to reproduce the results. In general, W-SLDA assures backward compatibility and it most cases you can use the latest version o the code. See here for a list of releases.
Create a working folder: Use template folders (st-project-template or td-project-template) to create an empty project.
Overwrite user-defined files: Overwrite the user-defined files *.h and the input file by those provided in the reproducibility pack.

List of reproducibility packs

Since 2022, reproducibility packs have been provided for all our papers where new results generated by the W-SLDA are presented. For the list of our publications and links to the reproducibility packs, see here.

Known issues with existing reproducibility packs

New J. Phys. 25, 033013 (2023)

Problem with a compilation of type:

/net/software/v1/software/binutils/2.38-GCCcore-11.3.0/bin/ld: obj//wslda_functionals.o: in function `compute_potentials_bdg':
wslda_functionals.c:(.text+0x1ecc): undefined reference to `scattering_length'
/net/software/v1/software/binutils/2.38-GCCcore-11.3.0/bin/ld: obj//wslda_functionals.o: in function `compute_potentials_sldae':
wslda_functionals.c:(.text+0x2c27): undefined reference to `scattering_length'
/net/software/v1/software/binutils/2.38-GCCcore-11.3.0/bin/ld: obj//wslda_functionals.o: in function `compute_energy_sldae':
wslda_functionals.c:(.text+0x41c9): undefined reference to `scattering_length'
collect2: error: ld returned 1 exit status
make: *** [/net/pr2/projects/plgrid/plgg_ntg_wut/share/cold-atoms//hpc-engine/mk.st:95: d2d] Error 1

Solution: Add to the problem-definition.h function:

double scattering_length(int ix, int iy, int iz, int it, double *params, size_t extra_data_size, void *extra_data)
{
    return input->sclgth; // by default return value from input file.
}