... | ... | @@ -33,8 +33,8 @@ To learn more about `MPI <--> GPU` mapping see: [Configuring GPU machine](Config |
|
|
|
|
|
# Number of MPI processes per GPU: performance notes
|
|
|
If number of lattice points:
|
|
|
* 3d code: `N = NX x NY x NZ,
|
|
|
* 2d code: `N = NX x NY,
|
|
|
* 1d code: `N = NX,
|
|
|
* 3d code: `N = NX x NY x NZ`,
|
|
|
* 2d code: `N = NX x NY`,
|
|
|
* 1d code: `N = NX`,
|
|
|
|
|
|
satisfy following criteria: `N >> number_of_CUDA_cores` then it is recommended to run the code where the number of MPI process is equal number of GPUs, i.e each GPU is assigned to only one MPI process. Number of CUDA cores depends on GPU type, but typically it is of the order of a few thousand. If the condition is not satisfied then user may consider assigning many MPI processes to a single GPU, as it can provide better performance. |
|
|
\ No newline at end of file |