Skip to content

GitLab

  • Menu
Projects Groups Snippets
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
  • wslda wslda
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 0
    • Issues 0
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 0
    • Merge requests 0
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Monitor
    • Monitor
    • Incidents
  • Packages & Registries
    • Packages & Registries
    • Package Registry
    • Container Registry
    • Infrastructure Registry
  • Analytics
    • Analytics
    • CI/CD
    • Repository
    • Value stream
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • wtools
  • wsldawslda
  • Wiki
  • Paralellization scheme of time dependent codes

Paralellization scheme of time dependent codes · Changes

Page history
Update Paralellization scheme of time dependent codes authored Jan 02, 2021 by Gabriel Wlazłowski's avatar Gabriel Wlazłowski
Hide whitespace changes
Inline Side-by-side
Paralellization-scheme-of-time-dependent-codes.md
View page @ 2951cebb
......@@ -9,8 +9,32 @@ Time-dependent codes evolve quasiparticle wave functions (qpwfs) which number de
```
Quasiparticle wave functions are distributed uniformly among `np` MPI process. For example if above example is executed on `np=32` then each process is responsible for evolving `nwfip = 46032/32 = 1438.5`. (In practice for this example processes evolve either 1438 or 1439 qpwf). Qpwfs are evolved by GPUs. It requires that for each MPI process a GPU must be assigned. Suppose that the code is executed on `4 nodes`, and each node is equipped with `4 GPUs`. Consider the following execution command:
```bash
mpiexec -ppn 8 -np 32 ./td-wslda-2d input.txt
mpiexec -ppn 4 -np 16 ./td-wslda-2d input.txt
```
where:
* `ppn`: processes per node,
* `np`: number of processes.
\ No newline at end of file
* `np`: number of processes,
* in input file: `gpuspernode 4`.
When executing the code, the following mapping `MPI Process <--> GPU` will be applied:
![td-scheme-2](uploads/99212e2b948a20702ce6c6937c176b8e/td-scheme-2.png)
In this case each MPI process is connected to one GPU.
Alternatively, one can use:
```bash
mpiexec -ppn 16 -np 32 ./td-wslda-2d input.txt
```
and distribution will be as follow:
![td-scheme](uploads/c6f4cd2418855a360754cb5baaaf6f0c/td-scheme.png)
In this case each GPU evolves qpwfs of two MPI processes.
To learn more about `MPI <--> GPU` mapping see: [Configuring GPU machine](Configuring GPU machine).
# Number of MPI processes per GPU: performance notes
If number of lattice points:
* 3d code: `N = NX x NY x NZ,
* 2d code: `N = NX x NY,
* 1d code: `N = NX,
satisfy following criteria: `N >> number_of_CUDA_cores` then it is recommended to run the code where the number of MPI process is equal number of GPUs, i.e each GPU is assigned to only one MPI process. Number of CUDA cores depends on GPU type, but typically it is of the order of a few thousand. If the condition is not satisfied then user may consider assigning many MPI processes to a single GPU, as it can provide better performance.
\ No newline at end of file
Clone repository
  • API version
  • Automatic interpolations
  • Auxiliary tools
  • Browsing the code
  • Broyden algorithm
  • C and CUDA
  • Campaign of calculations
  • Checking correctness of settings
  • Chemical potentials control
  • Code & Results quality
  • Common failures of static codes
  • Common failures of time dependent codes
  • Computation domain
  • Configuring GPU machine
  • Constraining densities and potentials
View All Pages