|
|
|
# ELPA Library
|
|
|
|
For static calculations it is recommend to use ELPA Library, which has better performance than ScaLapack. In particularly, ELPA allows for utilization of GPUs which provide significant boost for calculations. In order to activate ELPA lib in [predefines.h](https://gitlab.fizyka.pw.edu.pl/gabrielw/wslda/-/tree/public/st-myproject-template/predefines.h) set:
|
|
|
|
```c
|
|
|
|
// select diagonalization routine
|
|
|
|
#define DIAGONALIZATION_ROUTINE ELPA
|
|
|
|
```
|
|
|
|
Moreover, you need to inspect carefully part:
|
|
|
|
```c
|
|
|
|
// ---------------------- ELPA SETTINGS ---------------------------
|
|
|
|
// Fill this part only if ELPA library is used for diagonnalization
|
|
|
|
|
|
|
|
// uncomment it if you want to activate GPU for diagonalizations
|
|
|
|
#define ELPA_USE_GPU
|
|
|
|
|
|
|
|
// Select ELPA kernels
|
|
|
|
#define ELPS_USE_SOLVER ELPA_SOLVER_1STAGE
|
|
|
|
#define ELPA_USE_COMPLEX_KERNEL ELPA_2STAGE_COMPLEX_DEFAULT
|
|
|
|
#define ELPA_USE_REAL_KERNEL ELPA_2STAGE_REAL_DEFAULT
|
|
|
|
|
|
|
|
// Fraction of eigenvectors to be extracted in each cycle.
|
|
|
|
// 1.0 corresponds to extraction if all eigenvectors (USE IT IF YOU YOU ARE NOT SURE)
|
|
|
|
// NOTE: value of this parameter should assure that all eigenstates below requested Ec are extracted.
|
|
|
|
// NOTE: For 3D case this value typically can be set to 0.78
|
|
|
|
#define ELPA_NEV_FRACTION 1.0
|
|
|
|
```
|
|
|
|
|
|
|
|
## Documentation
|
|
|
|
1. [Eigenvalue SoLvers for Petaflop-Applications (ELPA)](https://elpa.mpcdf.mpg.de/)
|
|
|
|
2. [Wiki: Eigenvalue SoLvers for Petaflop-Applications (ELPA)](https://gitlab.mpcdf.mpg.de/elpa/elpa/-/wikis/home)
|
|
|
|
3. [ELPA installation guide](ELPA installation guide)
|
|
|
|
|
|
|
|
## Publications about ELPA performance
|
|
|
|
1. [GPU-Acceleration of the ELPA2 Distributed Eigensolver for Dense Symmetric and Hermitian Eigenproblems](https://arxiv.org/abs/2002.10991)
|
|
|
|
|
|
|
|
# ScaLapack library
|
|
|
|
If target system does not provide ELPA libray user can use (standard) diagonalization library: [ScaLAPACK](http://www.netlib.org/scalapack/). W-SLDA Toolkit can utilize following ScaLapack diagonalization engines:
|
|
|
|
```c
|
|
|
|
#define DIAGONALIZATION_ROUTINE PZHEEVR
|
|
|
|
```
|
|
|
|
or
|
|
|
|
```c
|
|
|
|
#define DIAGONALIZATION_ROUTINE PZHEEVD
|
|
|
|
```
|
|
|
|
It is recommended to use `PZHEEVR`. This engine takes advantage from fact that typically we extract only fraction of eigenstates. However, we find that in some rare cases (system dependent) this routine does not work correctly. In such can `PZHEEVD` should be used.
|
|
|
|
|
|
|
|
# Benchmarks
|
|
|
|
| matrix size | p | q | mb | nb | prec. | routine | system | time [sec] | cost |
|
|
|
|
| ------------|---|---|----|----|--------|---------|--------|------------|------|
|
|
|
|
| **221,184 = 2*48^3** |46 | 84| 16 | 16 | complex| ELPA (1-GPU) | Summit | **736** | **18.8 nh** |
|
|
|
|
| 221,184 |46 | 84| 16 | 16 | complex| ELPA (2-GPU) | Summit | 3098 | 79.2 nh |
|
|
|
|
| 221,184 |46 | 84| 16 | 16 | complex| PZHEEVD | Summit | 5995 | 153.2 nh |
|
|
|
|
| **524,288=2*64^3** | 96 | 112 | 16 | 16 | complex| ELPA (1-GPU) | Summit | **2,217** | **157.7 nh** |
|
|
|
|
| **746,496=2*72^3** | 112 | 192 | 16 | 16 | complex| ELPA (1-GPU) | Summit | **3,436** | **488.7 nh** |
|
|
|
|
| **128,000 = 2*40^3** | 20 | 20 | 32 | 32 | complex| ELPA (1-GPU) | Daint | **220** | **24.4 nh** |
|
|
|
|
| 128,000 | 54 | 64 | 32 | 32 | complex| ELPA (2-CPU) | Daint | 677 | 54.1 nh |
|
|
|
|
| 128,000 | 54 | 64 | 32 | 32 | complex| PZHEEVR | Daint | 945 | 75.6 nh |
|
|
|
|
| **147,456=4*64*24^2** | 24 | 25 | 32 | 32 | complex| ELPA (1-GPU) | Daint | **375** | **62.5 nh** |
|
|
|
|
|
|
|
|
(1-GPU): `ELPA_SOLVER_1STAGE`, `ELPA_2STAGE_COMPLEX_GPU`
|
|
|
|
(2-GPU): `ELPA_SOLVER_2STAGE`, `ELPA_2STAGE_COMPLEX_GPU`
|
|
|
|
(2-CPU): `ELPA_SOLVER_2STAGE`, `ELPA_2STAGE_COMPLEX_DEFAULT` |