... | ... | @@ -17,7 +17,7 @@ where: |
|
|
* in input file: `gpuspernode 4`.
|
|
|
When executing the code, the following mapping `MPI Process <--> GPU` will be applied:
|
|
|
|
|
|
![td-scheme-2](uploads/99212e2b948a20702ce6c6937c176b8e/td-scheme-2.png)
|
|
|
![td-scheme-2](uploads/392f47833c1d499edf7ce504cfbf277f/td-scheme-2.png)
|
|
|
|
|
|
In this case each MPI process is connected to one GPU.
|
|
|
Alternatively, one can use:
|
... | ... | @@ -37,4 +37,4 @@ If number of lattice points: |
|
|
* 2d code: `N = NX x NY`,
|
|
|
* 1d code: `N = NX`,
|
|
|
|
|
|
satisfy following criteria: `N >> number_of_CUDA_cores` then it is recommended to run the code where the number of MPI process is equal number of GPUs, i.e each GPU is assigned to only one MPI process. Number of CUDA cores depends on GPU type, but typically it is of the order of a few thousand. If the condition is not satisfied then user may consider assigning many MPI processes to a single GPU, as it can provide better performance. |
|
|
\ No newline at end of file |
|
|
satisfy following criteria: `N >> number_of_CUDA_cores` then it is recommended to run the code where the number of MPI process is equal number of GPUs, i.e each GPU is assigned to only one MPI process. Number of CUDA cores depends on GPU type, but typically it is of the order of a few thousand. If the condition is not satisfied then user may consider assigning many MPI processes to a single GPU, as it can provide better performance. |