W data format

Introduction

W-data was designed to satisfy the following requirements:

Binary data is stored in a conceptually easy format that allows a variety of tools/languages to be used.
Format provides storage for data with time stepping (frames / measurements / cycles).
Data format is suitable for parallel processing (preferably with MPI I/O).
Data is easy for processing via VisIt.
It provides the extensible framework - new variables can be created and easily added to the existing dataset.
Data is convenient for copy between computing systems.
It allows for easy extraction / copy of selected variables

W-data format concept

Data set will consist of set of files, example:

test.wtxt           # metadata file, this one should be indicated when opening in VisIt
test_density_a.wdat # binary file with data
test_delta.wdat     # binary file with data
test_current_a.wdat # binary file with data

Content of test.wtxt may look like:

# Comments with additional  info about data set
# Comments are ignored when reading by parser  

NX                       24   # lattice
NY                       28   # lattice
NZ                       32   # lattice
DX                        1   # spacing
DY                        1   # spacing
DZ                        1   # spacing
datadim                   3   # dimension of block size: 1=NX, 2=NX*NY, 3=NX*NY*NZ
prefix                 test   # prefix for files belonging to this data set, binary files have names prefix_variable.format
cycles                   10   # number of cycles (measurements)
t0                        0   # time value for the first cycle
dt                        1   # time interval between cycles

# variables
# tag                  name                    type                    unit                  format
var               density_a                    real                    none                    wdat
var                   delta                 complex                    none                    wdat
var               current_a                  vector                    none                    wdat

# links
# tag                  name                 link-to
link              density_b               density_a
link              current_b               current_a

# consts
# tag                  name                   value
const                    eF                     0.5
const                    kF                       1

According to our experience, three types of variables (real, complex, vector) are sufficient and cover more 90% of applications.

Binary files store data as row arrays, called datablocks:

The size of datablock depends on variable type and data dimensionality.

real: blocksize=blocklength*8 Bytes
complex: blocksize=blocklength*16 Bytes
vector: blocksize=blocklength*8*3 Bytes

where blocklength is

for datadim=3: blocklength=NX*NY*NZ
for datadim=2: blocklength=NX*NY
for datadim=1: blocklength=NX

Note that for vector variables we use following storage pattern:

To compute time associated with given cycle use: time=t0+cycleid*dt

W-data format allows for the representation of the following elements:

variable:
Each variable is represented by the binary file of name prefix_varname.format. The variable description has following format:
var name type unit format
Following formats are allowed:
- wdat: default format for WSLDA codes. Binary files contain row data (no header).
- dpca: (deprecated) previous format of cold atomic codes. Binary file contains header of size 68B where additional info about file content is stored. For this format wdata lib provides only reading functionality.
- npy: binary files are numpy arrays. Functionality under construction
  In order to switch WSLDA codes to writing in this format add to input file:
  dataformat npy
link:
It is an alternative name for a given variable. In the case of WSLDA codes in many cases, we do the computation for systems that exhibit some symmetries, like spin symmetry. Then densities for spin-a and spin-b particles are exactly the same. In order to save disk space we can save only one of them:
var density_a real none wdat
and for another one set link:
link density_b density_a
constant:
Typically besides variables, we have some constants that are useful during the data analysis process. For example, when making plots it is convenient to express variables in dimensionless form, like delta/eF. To provide user info what are values of selected constants we use const field:
const eF 0.5

Low-level reading and writing of data

Code snippet demonstrating how to load datablock from wdat file:

double *data; // pointer to real data
int cycleid;  // id of cycle for loading 

char file_name[128]="test_density_a.wdat";

FILE *pFile;

pFile= fopen (file_name, "rb"); // open file 
if(pFile==NULL) printf("ERROR: Cannot open file\n");   
    
// set pointer to correct location
size_t blocklength = NX*NY*NZ;  // for datadim=3
size_t blocksize = blocklength*8; // for real variable
size_t ptr_shift = 0;
// in case of other formats, like npy:
// ptr_shift += size_of_header; // skip header
ptr_shift += blocksize*cycleid; // set pointer to correct location
if(fseek ( pFile, ptr_shift, SEEK_SET ) != 0 ) printf("ERROR: Cannot set pointer to datablock\n");

size_t test_ele = fread (data , blocksize, 1, pFile); // read datablock
if(test_ele != 1) printf("ERROR: Cannot read datablock\n");

fclose(pFile); // close file

Code snippet demonstrating how to add new datablock to wdat file:

double *data; // pointer to real data
 
char file_name[128]="test_density_a.wdat";

FILE *pFile;

pFile= fopen (file_name, "ab"); // open file 
if(pFile==NULL) printf("ERROR: Cannot open file\n");
      
size_t blocklength = NX*NY*NZ;  // for datadim=3
size_t blocksize = blocklength*8; // for real variable
size_t test_ele = fwrite (data , blocksize, 1, pFile); // add datablock
if(test_ele != 1) printf("ERROR: Cannot add datablock\n");

// in case of other formats, like npy, additional change of header is required.
//    ...

fclose(pFile); // close file

W-data C library

Folder lib/wdata contains library that provides support for w-data processing.
List of examples demonstrating how to use this library.

example-write.c: code creates artificial set of variables and writes them do wdat files.
example-read.c: code reads data from wdat files (created by example-write.c).
example-addvar.c: code adds new variable to existing data set (created by example-write.c).

In order to compile examples (optionally you will need to modify Makefile):

[wtools@dell cold-atoms]$ cd lib/wdata/
[wtools@dell lib/wdata]$ make

W-data python library

See PyPi W-data Format