Setting up diagonalization engine

ELPA Library
- Documentation
- Publications about ELPA performance
ScaLapack library
Benchmarks & Scalings
- Table
- Plots
  - Summit

ELPA Library

For static calculations, it is recommended to use ELPA Library, which has better performance than ScaLapack. In particular, ELPA allows for the utilization of GPUs which provide a significant boost for calculations. In order to activate ELPA lib in predefines.h set:

// select diagonalization routine
#define DIAGONALIZATION_ROUTINE ELPA

Moreover, you need to inspect carefully part:

// ---------------------- ELPA SETTINGS ---------------------------
// Fill this part only if ELPA library is used for diagonnalization

// uncomment it if you want to activate GPU for diagonalizations 
#define ELPA_USE_GPU

// Select ELPA kernels
#define ELPS_USE_SOLVER ELPA_SOLVER_1STAGE
#define ELPA_USE_COMPLEX_KERNEL ELPA_2STAGE_COMPLEX_DEFAULT
#define ELPA_USE_REAL_KERNEL ELPA_2STAGE_REAL_DEFAULT

// Fraction of eigenvectors to be extracted in each cycle.
// 1.0 corresponds to extraction if all eigenvectors (USE IT IF YOU YOU ARE NOT SURE)
// NOTE: value of this parameter should assure that all eigenstates below requested Ec are extracted.  
// NOTE: For 3D case this value typically can be set to 0.78
#define ELPA_NEV_FRACTION 1.0

Documentation

Publications about ELPA performance

GPU-Acceleration of the ELPA2 Distributed Eigensolver for Dense Symmetric and Hermitian Eigenproblems

ScaLapack library

If the target system does not provide ELPA library user can use (standard) diagonalization library: ScaLAPACK. W-SLDA Toolkit can utilize the following ScaLapack diagonalization engines:

#define DIAGONALIZATION_ROUTINE PZHEEVR

#define DIAGONALIZATION_ROUTINE PZHEEVD

It is recommended to use PZHEEVR. This engine takes advantage from the fact that typically we extract only a fraction of eigenstates. However, we find that in some rare cases (system dependent) this routine does not work correctly. In such a case, PZHEEVD should be used.

Benchmarks & Scalings

All tests correspond to the extraction of all eigenvectors.

Table

matrix size	p	q	mb	nb	prec.	routine	system	time [sec]	cost
32,768 = 2x128^2	6	8	16	16	real	ELPA (2-GPU)	Cygnus	93	0.052 nh
45,000 = 2x150^2	6	8	16	16	real	ELPA (2-GPU)	Cygnus	217	0.12 nh
45,000 = 2x150^2	6	8	16	16	complex	ELPA (2-GPU)	Cygnus	860	0.478 nh
65,536 = 2x32^3	24	28	32	32	complex	ELPA (2-GPU)	Summit	115	0.52 nh
65,536 = 2x32^3	24	28	8	8	complex	ELPA (1-GPU)	Summit	118	0.52 nh
128,000 = 2x40^3	24	28	8	8	complex	ELPA (1-GPU)	Summit	435	1.93 nh
128,000	24	28	32	32	complex	ELPA (2-GPU)	Summit	511	2.27 nh
128,000 = 2x40^3	20	20	32	32	complex	ELPA (1-GPU)	Daint	220	24.4 nh
128,000	54	64	32	32	complex	ELPA (2-CPU)	Daint	677	54.1 nh
128,000	54	64	32	32	complex	PZHEEVR	Daint	945	75.6 nh
147,456 = 4x64x24^2	24	25	32	32	complex	ELPA (1-GPU)	Daint	375	62.5 nh
147,456 = 2x768x96	18	18	16	16	double	ELPA (1-GPU)	Daint	395	35.6 nh
221,184 = 2x48^3	46	84	32	32	complex	ELPA (2-GPU)	Summit	603	15.4 nh
221,184	46	84	16	16	complex	ELPA (1-GPU)	Summit	736	18.8 nh
221,184	46	84	16	16	complex	ELPA (2-GPU)	Summit	3098	79.2 nh
221,184	46	84	16	16	complex	PZHEEVD	Summit	5995	153.2 nh
500,000 = 2x50^2x100	96	112	16	16	complex	ELPA (1-GPU)	Summit	2,109	150.0 nh
524,288 = 2x64^3	96	112	16	16	complex	ELPA (1-GPU)	Summit	2,217	157.7 nh
746,496 = 2x72^3	112	192	16	16	complex	ELPA (1-GPU)	Summit	3,436	488.7 nh
746,496	112	192	64	64	complex	ELPA (2-GPU)	Summit	3,628	516.0 nh
1,769,472 = 2x96^3	300	560	32	32	complex	ELPA (1-GPU)	Summit	52,024	57,804 nh

(1-GPU): ELPA_SOLVER_1STAGE, ELPA_2STAGE_COMPLEX_GPU or ELPA_2STAGE_REAL_GPU
(1-CPU): ELPA_SOLVER_1STAGE, ELPA_2STAGE_COMPLEX_DEFAULT or ELPA_2STAGE_REAL_DEFAULT
(2-GPU): ELPA_SOLVER_2STAGE, ELPA_2STAGE_COMPLEX_GPU or ELPA_2STAGE_REAL_GPU
(2-CPU): ELPA_SOLVER_2STAGE, ELPA_2STAGE_COMPLEX_DEFAULT or ELPA_2STAGE_REAL_DEFAULT

Plots

These scalings are derived empirically: points correspond to real measurement on target system, while line shows a fit of ideal scaling for level-3 rutines (\sim N^3)

Summit

The scaling was derived within ALCC grant Quantum Turbulence in Fermi Superfluids.

Raw data: summit-scaling.txt
Gnuplot script: summit-scaling.gp