Diagonalization engine
For diagonalizations of the BdG matrix, the W-SLDA toolkit can utilize libraries:
The selection is the diagonalization library is done via machine.h file:
/**
* Select diagonalization routine
* ELPA demonstrates the best performance; use it if the target system supports this library.
* Otherwise use standard ScaLapack lib (PZHEEV?).
* In case of ScaLapack, it is recommended to use PZHEEVR, unless this routine does not work correctly (it may happen on some systems)
* For more info, see: Wiki -> Setting up diagonalization engine
* */
#define DIAGONALIZATION_ROUTINE PZHEEVR
// #define DIAGONALIZATION_ROUTINE PZHEEVD
// #define DIAGONALIZATION_ROUTINE ELPA
ELPA Library
For static calculations, it is recommended to use the ELPA Library, which performs better than ScaLapack. In particular, ELPA supports the use of GPUs, which provide a significant boost to calculations. Once the ELPA library is activated
// #define DIAGONALIZATION_ROUTINE PZHEEVR
// #define DIAGONALIZATION_ROUTINE PZHEEVD
#define DIAGONALIZATION_ROUTINE ELPA
you also need to inspect carefully further parts:
/**
* ---------------------- ELPA SETTINGS ---------------------------
* Fill this part only if the ELPA library is used for diagonalization
*
* Default settings are: ELPA_SOLVER_1STAGE
* but you can overwrite using the options below
* */
/**
* Uncomment it if you want to activate GPUs for diagonalizations
* */
// #define ELPA_USE_GPU
/**
* Select ELPA kernels,
* for more info, see the documentation of the ELPA lib
* */
// #define ELPA_USE_SOLVER ELPA_SOLVER_2STAGE
// #define ELPA_USE_COMPLEX_KERNEL ELPA_2STAGE_COMPLEX_GPU
// #define ELPA_USE_REAL_KERNEL ELPA_2STAGE_REAL_GPU
Further materials for ELPA Library
Documentation
- Eigenvalue SoLvers for Petaflop-Applications (ELPA)
- Wiki: Eigenvalue SoLvers for Petaflop-Applications (ELPA)
- ELPA installation guide
Publications about ELPA performance
- GPU-Acceleration of the ELPA2 Distributed Eigensolver for Dense Symmetric and Hermitian Eigenproblems
- Fermionic quantum turbulence: Pushing the limits of high-performance computing
ScaLapack library
If the target system does not provide the ELPA library, the user can use (standard) diagonalization library: ScaLAPACK. W-SLDA Toolkit can utilize the following ScaLapack diagonalization engines:
#define DIAGONALIZATION_ROUTINE PZHEEVR
or
#define DIAGONALIZATION_ROUTINE PZHEEVD
It is recommended to use PZHEEVR. This engine exploits the fact that, in practice, we typically extract only a fraction of the eigenstates. However, we find that in some rare cases (system-dependent), this routine does not work correctly. In such a case, PZHEEVD should be used.
Benchmarks & Scalings
All tests correspond to the extraction of all eigenvectors.
Table
| matrix size | p | q | mb | nb | prec. | routine | system | time [sec] | cost |
|---|---|---|---|---|---|---|---|---|---|
| 32,768 = 2x128^2 | 6 | 8 | 16 | 16 | real | ELPA (2-GPU) | Cygnus | 93 | 0.052 nh |
| 45,000 = 2x150^2 | 6 | 8 | 16 | 16 | real | ELPA (2-GPU) | Cygnus | 217 | 0.12 nh |
| 45,000 = 2x150^2 | 6 | 8 | 16 | 16 | complex | ELPA (2-GPU) | Cygnus | 860 | 0.478 nh |
| 65,536 = 2x32^3 | 24 | 28 | 32 | 32 | complex | ELPA (2-GPU) | Summit | 115 | 0.52 nh |
| 65,536 = 2x32^3 | 24 | 28 | 8 | 8 | complex | ELPA (1-GPU) | Summit | 118 | 0.52 nh |
| 128,000 = 2x40^3 | 24 | 28 | 8 | 8 | complex | ELPA (1-GPU) | Summit | 435 | 1.93 nh |
| 128,000 | 24 | 28 | 32 | 32 | complex | ELPA (2-GPU) | Summit | 511 | 2.27 nh |
| 128,000 = 2x40^3 | 20 | 20 | 32 | 32 | complex | ELPA (1-GPU) | Daint | 220 | 24.4 nh |
| 128,000 | 54 | 64 | 32 | 32 | complex | ELPA (2-CPU) | Daint | 677 | 54.1 nh |
| 128,000 | 54 | 64 | 32 | 32 | complex | PZHEEVR | Daint | 945 | 75.6 nh |
| 147,456 = 4x64x24^2 | 24 | 25 | 32 | 32 | complex | ELPA (1-GPU) | Daint | 375 | 62.5 nh |
| 147,456 = 2x768x96 | 18 | 18 | 16 | 16 | double | ELPA (1-GPU) | Daint | 395 | 35.6 nh |
| 221,184 = 2x48^3 | 46 | 84 | 32 | 32 | complex | ELPA (2-GPU) | Summit | 603 | 15.4 nh |
| 221,184 | 46 | 84 | 16 | 16 | complex | ELPA (1-GPU) | Summit | 736 | 18.8 nh |
| 221,184 | 46 | 84 | 16 | 16 | complex | ELPA (2-GPU) | Summit | 3098 | 79.2 nh |
| 221,184 | 46 | 84 | 16 | 16 | complex | PZHEEVD | Summit | 5995 | 153.2 nh |
| 500,000 = 2x50^2x100 | 96 | 112 | 16 | 16 | complex | ELPA (1-GPU) | Summit | 2,109 | 150.0 nh |
| 524,288 = 2x64^3 | 96 | 112 | 16 | 16 | complex | ELPA (1-GPU) | Summit | 2,217 | 157.7 nh |
| 746,496 = 2x72^3 | 112 | 192 | 16 | 16 | complex | ELPA (1-GPU) | Summit | 3,436 | 488.7 nh |
| 746,496 | 112 | 192 | 64 | 64 | complex | ELPA (2-GPU) | Summit | 3,628 | 516.0 nh |
| 1,769,472 = 2x96^3 | 300 | 560 | 32 | 32 | complex | ELPA (1-GPU) | Summit | 52,024 | 57,804 nh |
(1-GPU): ELPA_SOLVER_1STAGE, ELPA_2STAGE_COMPLEX_GPU or ELPA_2STAGE_REAL_GPU
(1-CPU): ELPA_SOLVER_1STAGE, ELPA_2STAGE_COMPLEX_DEFAULT or ELPA_2STAGE_REAL_DEFAULT
(2-GPU): ELPA_SOLVER_2STAGE, ELPA_2STAGE_COMPLEX_GPU or ELPA_2STAGE_REAL_GPU
(2-CPU): ELPA_SOLVER_2STAGE, ELPA_2STAGE_COMPLEX_DEFAULT or ELPA_2STAGE_REAL_DEFAULT
Plots
These scalings are derived empirically: points correspond to real measurement on the target system, while the line shows a fit of ideal scaling for level-3 routines (\sim N^3)
Summit
The scaling was derived within the ALCC grant Quantum Turbulence in Fermi Superfluids.
- Raw data: summit-scaling.txt
- Gnuplot script: summit-scaling.gp
