Skip to content

GitLab

  • Menu
Projects Groups Snippets
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
  • wslda wslda
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 0
    • Issues 0
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 0
    • Merge requests 0
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Monitor
    • Monitor
    • Incidents
  • Packages & Registries
    • Packages & Registries
    • Package Registry
    • Container Registry
    • Infrastructure Registry
  • Analytics
    • Analytics
    • CI/CD
    • Repository
    • Value stream
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • wtools
  • wsldawslda
  • Wiki
  • Paralellization scheme of time dependent codes

Paralellization scheme of time dependent codes · Changes

Page history
Update Paralellization scheme of time dependent codes authored Jan 02, 2021 by Gabriel Wlazłowski's avatar Gabriel Wlazłowski
Hide whitespace changes
Inline Side-by-side
Paralellization-scheme-of-time-dependent-codes.md
View page @ 856903f0
# General info # General info
The spirit of SLDA method is to exploit only the local densities during the computation process. This feature makes the method excellent candidate for utilization multithreading computing units like GPUs. Instead of iterate over all lattice points (case of CPU implementation), we can create number of lattice points `NX x NY x NZ` concurrent and independent threads, and assign with each single thread all operations related to one point either in position or momentum space, see Fig. below. Switching between spaces is preformed by parallel cuFFT implementation, which is even more that 100 times faster than CPU implementation (like FFTW). The spirit of SLDA method is to exploit only the local densities during the computation process. This feature makes the method an excellent candidate for utilization multithreading computing units like GPUs. Instead of iterate over all lattice points (case of CPU implementation), we can create number of lattice points `NX x NY x NZ` concurrent and independent threads, and assign with each single thread all operations related to one point either in position or momentum space, see Fig. below. Switching between spaces is performed by parallel cuFFT implementation, which is even more than 100 times faster than CPU implementation (like FFTW).
![td-pscheme](uploads/b20710aa7608e7240a6f36aa32ac5659/td-pscheme.png) ![td-pscheme](uploads/b20710aa7608e7240a6f36aa32ac5659/td-pscheme.png)
# MPI space and GPUs # MPI space and GPUs
Time-dependent codes evolve quasiparticle wave functions (qpwfs) which number depends mainly in the lattice size and value of the cut-off energy `ec`. The number of wave-functions is printed under the name `nwf`, for example:
```
# INIT2: nwf=46032 wave-functions to scatter
```
Quasiparticle wave functions are distributed uniformly among `np` MPI process. For example if above example is executed on `np=32` then each process is responsible for evolving `nwfip = 46032/32 = 1438.5`. (In practice for this example processes evolve either 1438 or 1439 qpwf). Qpwfs are evolved by GPUs. It requires that for each MPI process a GPU must be assigned. Suppose that the code is executed on `4 nodes`, and each node is equipped with `4 GPUs`. Consider the following execution command:
```bash
mpiexec -ppn 8 -np 32 ./td-wslda-2d input.txt
```
where:
* `ppn`: processes per node,
* `np`: number of processes.
\ No newline at end of file
Clone repository
  • API version
  • Automatic interpolations
  • Auxiliary tools
  • Browsing the code
  • Broyden algorithm
  • C and CUDA
  • Campaign of calculations
  • Checking correctness of settings
  • Chemical potentials control
  • Code & Results quality
  • Common failures of static codes
  • Common failures of time dependent codes
  • Computation domain
  • Configuring GPU machine
  • Constraining densities and potentials
View All Pages