W-SLDA Toolkit provides a script that can be used to estimate the number of GPUs that you need to run your code efficiently: tools/td-memory.py. The user must edit the # SETTINGS section and run the code. Example:
# SETTINGS
NX = 128
NY = 128
NZ = 16
codedim=2 # dimensionality of code
nwf=70141 # provide here the number if you know it, otherwise the code will use simple estimate
mem_per_gpu = 16.0 # in GB
min_mem_utilization = 2.0 # in GB
Note that the number of wave-functions to be evolved is typically printed by st-wslda code when writing them to files. Optionally, you can leave nwf=None, and then the script will use an estimate for this number.
Running the script:
[gabrielw@wutdell tools]$ python td-memory.py
MINIMAL NUMBER OF GPUs=24
and a plot like this will show up:

To achieve good code performance, it is recommended that each GPU card use about 50% or more of its capacity. In the given example, it is recommended to run the code with fewer than 50 GPUs.