Skip to content

GitLab

  • Menu
Projects Groups Snippets
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
  • wslda wslda
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 0
    • Issues 0
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 0
    • Merge requests 0
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Monitor
    • Monitor
    • Incidents
  • Packages & Registries
    • Packages & Registries
    • Package Registry
    • Container Registry
    • Infrastructure Registry
  • Analytics
    • Analytics
    • CI/CD
    • Repository
    • Value stream
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • wtools
  • wsldawslda
  • Wiki
  • Setting up diagonalization engine

Last edited by Gabriel Wlazłowski May 20, 2021
Page history

Setting up diagonalization engine

  • ELPA Library
    • Documentation
    • Publications about ELPA performance
  • ScaLapack library
  • Benchmarks & Scalings
    • Table
    • Plots
      • Summit

ELPA Library

For static calculations, it is recommended to use ELPA Library, which has better performance than ScaLapack. In particular, ELPA allows for the utilization of GPUs which provide a significant boost for calculations. In order to activate ELPA lib in predefines.h set:

// select diagonalization routine
#define DIAGONALIZATION_ROUTINE ELPA

Moreover, you need to inspect carefully part:

// ---------------------- ELPA SETTINGS ---------------------------
// Fill this part only if ELPA library is used for diagonnalization

// uncomment it if you want to activate GPU for diagonalizations 
#define ELPA_USE_GPU

// Select ELPA kernels
#define ELPS_USE_SOLVER ELPA_SOLVER_1STAGE
#define ELPA_USE_COMPLEX_KERNEL ELPA_2STAGE_COMPLEX_DEFAULT
#define ELPA_USE_REAL_KERNEL ELPA_2STAGE_REAL_DEFAULT

// Fraction of eigenvectors to be extracted in each cycle.
// 1.0 corresponds to extraction if all eigenvectors (USE IT IF YOU YOU ARE NOT SURE)
// NOTE: value of this parameter should assure that all eigenstates below requested Ec are extracted.  
// NOTE: For 3D case this value typically can be set to 0.78
#define ELPA_NEV_FRACTION 1.0

Documentation

  1. Eigenvalue SoLvers for Petaflop-Applications (ELPA)
  2. Wiki: Eigenvalue SoLvers for Petaflop-Applications (ELPA)
  3. ELPA installation guide

Publications about ELPA performance

  1. GPU-Acceleration of the ELPA2 Distributed Eigensolver for Dense Symmetric and Hermitian Eigenproblems

ScaLapack library

If the target system does not provide ELPA library user can use (standard) diagonalization library: ScaLAPACK. W-SLDA Toolkit can utilize the following ScaLapack diagonalization engines:

#define DIAGONALIZATION_ROUTINE PZHEEVR

or

#define DIAGONALIZATION_ROUTINE PZHEEVD

It is recommended to use PZHEEVR. This engine takes advantage from the fact that typically we extract only a fraction of eigenstates. However, we find that in some rare cases (system dependent) this routine does not work correctly. In such a case, PZHEEVD should be used.

Benchmarks & Scalings

All tests correspond to the extraction of all eigenvectors.

Table

matrix size p q mb nb prec. routine system time [sec] cost
32,768 = 2x128^2 6 8 16 16 real ELPA (2-GPU) Cygnus 93 0.052 nh
45,000 = 2x150^2 6 8 16 16 real ELPA (2-GPU) Cygnus 217 0.12 nh
45,000 = 2x150^2 6 8 16 16 complex ELPA (2-GPU) Cygnus 860 0.478 nh
65,536 = 2x32^3 24 28 32 32 complex ELPA (2-GPU) Summit 115 0.52 nh
65,536 = 2x32^3 24 28 8 8 complex ELPA (1-GPU) Summit 118 0.52 nh
128,000 = 2x40^3 24 28 8 8 complex ELPA (1-GPU) Summit 435 1.93 nh
128,000 24 28 32 32 complex ELPA (2-GPU) Summit 511 2.27 nh
128,000 = 2x40^3 20 20 32 32 complex ELPA (1-GPU) Daint 220 24.4 nh
128,000 54 64 32 32 complex ELPA (2-CPU) Daint 677 54.1 nh
128,000 54 64 32 32 complex PZHEEVR Daint 945 75.6 nh
147,456 = 4x64x24^2 24 25 32 32 complex ELPA (1-GPU) Daint 375 62.5 nh
147,456 = 2x768x96 18 18 16 16 double ELPA (1-GPU) Daint 395 35.6 nh
221,184 = 2x48^3 46 84 32 32 complex ELPA (2-GPU) Summit 603 15.4 nh
221,184 46 84 16 16 complex ELPA (1-GPU) Summit 736 18.8 nh
221,184 46 84 16 16 complex ELPA (2-GPU) Summit 3098 79.2 nh
221,184 46 84 16 16 complex PZHEEVD Summit 5995 153.2 nh
500,000 = 2x50^2x100 96 112 16 16 complex ELPA (1-GPU) Summit 2,109 150.0 nh
524,288 = 2x64^3 96 112 16 16 complex ELPA (1-GPU) Summit 2,217 157.7 nh
746,496 = 2x72^3 112 192 16 16 complex ELPA (1-GPU) Summit 3,436 488.7 nh
746,496 112 192 64 64 complex ELPA (2-GPU) Summit 3,628 516.0 nh
1,769,472 = 2x96^3 300 560 32 32 complex ELPA (1-GPU) Summit 52,024 57,804 nh

(1-GPU): ELPA_SOLVER_1STAGE, ELPA_2STAGE_COMPLEX_GPU or ELPA_2STAGE_REAL_GPU
(1-CPU): ELPA_SOLVER_1STAGE, ELPA_2STAGE_COMPLEX_DEFAULT or ELPA_2STAGE_REAL_DEFAULT
(2-GPU): ELPA_SOLVER_2STAGE, ELPA_2STAGE_COMPLEX_GPU or ELPA_2STAGE_REAL_GPU
(2-CPU): ELPA_SOLVER_2STAGE, ELPA_2STAGE_COMPLEX_DEFAULT or ELPA_2STAGE_REAL_DEFAULT

Plots

These scalings are derived empirically: points correspond to real measurement on target system, while line shows a fit of ideal scaling for level-3 rutines (\sim N^3)

Summit

The scaling was derived within ALCC grant Quantum Turbulence in Fermi Superfluids.

  • Raw data: summit-scaling.txt
  • Gnuplot script: summit-scaling.gp summit-scaling
Clone repository

Content of Documentation
Official webpage
W-BSK Toolkit