|  | The spirit of SLDA method is to exploit only the local densities during the computation process. This feature makes the method an excellent candidate for utilization multithreading computing units like GPUs. Instead of iterate over all lattice points (case of CPU implementation), we can create number of lattice points `NX x NY x NZ` concurrent and independent threads, and assign with each single thread all operations related to one point either in position or momentum space, see Fig. below. Switching between spaces is performed by parallel cuFFT implementation, which is even more than 100 times faster than CPU implementation (like FFTW). |  | The spirit of SLDA method is to exploit only the local densities during the computation process. This feature makes the method an excellent candidate for utilization multithreading computing units like GPUs. Instead of iterate over all lattice points (case of CPU implementation), we can create number of lattice points `NX x NY x NZ` concurrent and independent threads, and assign with each single thread all operations related to one point either in position or momentum space, see Fig. below. Switching between spaces is performed by parallel cuFFT implementation, which is even more than 100 times faster than CPU implementation (like FFTW). |