Skip to content

GitLab

  • Menu
Projects Groups Snippets
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
  • wslda wslda
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 0
    • Issues 0
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 0
    • Merge requests 0
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Monitor
    • Monitor
    • Incidents
  • Packages & Registries
    • Packages & Registries
    • Package Registry
    • Container Registry
    • Infrastructure Registry
  • Analytics
    • Analytics
    • CI/CD
    • Repository
    • Value stream
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • wtools
  • wsldawslda
  • Wiki
  • Setting up diagonalization engine

Last edited by Gabriel Wlazłowski Feb 20, 2026
Page history

Setting up diagonalization engine

  • Diagonalization engine
  • ELPA Library
    • Further materials for ELPA Library
      • Documentation
      • Publications about ELPA performance
  • ScaLapack library
  • Benchmarks & Scalings
    • Table
    • Plots
      • Summit

Diagonalization engine

For diagonalizations of the BdG matrix, the W-SLDA toolkit can utilize libraries:

  • ScaLapack (routines: pzheevr or pzheevd)
  • ELPA

The selection is the diagonalization library is done via machine.h file:

/**
 * Select diagonalization routine
 * ELPA demonstrates the best performance; use it if the target system supports this library.
 * Otherwise use standard ScaLapack lib (PZHEEV?).
 * In case of ScaLapack, it is recommended to use PZHEEVR, unless this routine does not work correctly (it may happen on some systems)
 * For more info, see: Wiki -> Setting up diagonalization engine
 * */
#define DIAGONALIZATION_ROUTINE PZHEEVR
// #define DIAGONALIZATION_ROUTINE PZHEEVD
// #define DIAGONALIZATION_ROUTINE ELPA

ELPA Library

For static calculations, it is recommended to use the ELPA Library, which performs better than ScaLapack. In particular, ELPA supports the use of GPUs, which provide a significant boost to calculations. Once the ELPA library is activated

// #define DIAGONALIZATION_ROUTINE PZHEEVR
// #define DIAGONALIZATION_ROUTINE PZHEEVD
#define DIAGONALIZATION_ROUTINE ELPA

you also need to inspect carefully further parts:

/**
 * ---------------------- ELPA SETTINGS ---------------------------
 * Fill this part only if the ELPA library is used for diagonalization
 * 
 * Default settings are: ELPA_SOLVER_1STAGE
 * but you can overwrite using the options below
 * */

/**
 * Uncomment it if you want to activate GPUs for diagonalizations 
 * */
// #define ELPA_USE_GPU

/**
 * Select ELPA kernels,
 * for more info, see the  documentation of the ELPA lib
 * */
// #define ELPA_USE_SOLVER ELPA_SOLVER_2STAGE
// #define ELPA_USE_COMPLEX_KERNEL ELPA_2STAGE_COMPLEX_GPU
// #define ELPA_USE_REAL_KERNEL ELPA_2STAGE_REAL_GPU

Further materials for ELPA Library

Documentation

  1. Eigenvalue SoLvers for Petaflop-Applications (ELPA)
  2. Wiki: Eigenvalue SoLvers for Petaflop-Applications (ELPA)
  3. ELPA installation guide

Publications about ELPA performance

  1. GPU-Acceleration of the ELPA2 Distributed Eigensolver for Dense Symmetric and Hermitian Eigenproblems
  2. Fermionic quantum turbulence: Pushing the limits of high-performance computing

ScaLapack library

If the target system does not provide the ELPA library, the user can use (standard) diagonalization library: ScaLAPACK. W-SLDA Toolkit can utilize the following ScaLapack diagonalization engines:

#define DIAGONALIZATION_ROUTINE PZHEEVR

or

#define DIAGONALIZATION_ROUTINE PZHEEVD

It is recommended to use PZHEEVR. This engine exploits the fact that, in practice, we typically extract only a fraction of the eigenstates. However, we find that in some rare cases (system-dependent), this routine does not work correctly. In such a case, PZHEEVD should be used.

Benchmarks & Scalings

All tests correspond to the extraction of all eigenvectors.

Table

matrix size p q mb nb prec. routine system time [sec] cost
32,768 = 2x128^2 6 8 16 16 real ELPA (2-GPU) Cygnus 93 0.052 nh
45,000 = 2x150^2 6 8 16 16 real ELPA (2-GPU) Cygnus 217 0.12 nh
45,000 = 2x150^2 6 8 16 16 complex ELPA (2-GPU) Cygnus 860 0.478 nh
65,536 = 2x32^3 24 28 32 32 complex ELPA (2-GPU) Summit 115 0.52 nh
65,536 = 2x32^3 24 28 8 8 complex ELPA (1-GPU) Summit 118 0.52 nh
128,000 = 2x40^3 24 28 8 8 complex ELPA (1-GPU) Summit 435 1.93 nh
128,000 24 28 32 32 complex ELPA (2-GPU) Summit 511 2.27 nh
128,000 = 2x40^3 20 20 32 32 complex ELPA (1-GPU) Daint 220 24.4 nh
128,000 54 64 32 32 complex ELPA (2-CPU) Daint 677 54.1 nh
128,000 54 64 32 32 complex PZHEEVR Daint 945 75.6 nh
147,456 = 4x64x24^2 24 25 32 32 complex ELPA (1-GPU) Daint 375 62.5 nh
147,456 = 2x768x96 18 18 16 16 double ELPA (1-GPU) Daint 395 35.6 nh
221,184 = 2x48^3 46 84 32 32 complex ELPA (2-GPU) Summit 603 15.4 nh
221,184 46 84 16 16 complex ELPA (1-GPU) Summit 736 18.8 nh
221,184 46 84 16 16 complex ELPA (2-GPU) Summit 3098 79.2 nh
221,184 46 84 16 16 complex PZHEEVD Summit 5995 153.2 nh
500,000 = 2x50^2x100 96 112 16 16 complex ELPA (1-GPU) Summit 2,109 150.0 nh
524,288 = 2x64^3 96 112 16 16 complex ELPA (1-GPU) Summit 2,217 157.7 nh
746,496 = 2x72^3 112 192 16 16 complex ELPA (1-GPU) Summit 3,436 488.7 nh
746,496 112 192 64 64 complex ELPA (2-GPU) Summit 3,628 516.0 nh
1,769,472 = 2x96^3 300 560 32 32 complex ELPA (1-GPU) Summit 52,024 57,804 nh

(1-GPU): ELPA_SOLVER_1STAGE, ELPA_2STAGE_COMPLEX_GPU or ELPA_2STAGE_REAL_GPU
(1-CPU): ELPA_SOLVER_1STAGE, ELPA_2STAGE_COMPLEX_DEFAULT or ELPA_2STAGE_REAL_DEFAULT
(2-GPU): ELPA_SOLVER_2STAGE, ELPA_2STAGE_COMPLEX_GPU or ELPA_2STAGE_REAL_GPU
(2-CPU): ELPA_SOLVER_2STAGE, ELPA_2STAGE_COMPLEX_DEFAULT or ELPA_2STAGE_REAL_DEFAULT

Plots

These scalings are derived empirically: points correspond to real measurement on the target system, while the line shows a fit of ideal scaling for level-3 routines (\sim N^3)

Summit

The scaling was derived within the ALCC grant Quantum Turbulence in Fermi Superfluids.

  • Raw data: summit-scaling.txt
  • Gnuplot script: summit-scaling.gp summit-scaling
Clone repository

Official webpage
Main Repo
Main Docs
W-BSK Toolkit
Mirror Repo: GitLab, GitHub
Mirror Doc: GitLab, GitHub