A library for QCD on GPUs

View the Project on GitHub lattice/quda

QUDA: A library for QCD on GPUs

QUDA is a library for performing calculations in lattice QCD on graphics processing units (GPUs), leveraging NVIDIA’s CUDA platform. The current release includes optimized Dirac operators and solvers for the following fermion actions:

Implementations of CG, multi-shift CG, BiCGstab, and DD-preconditioned GCR are provided, including robust mixed-precision variants supporting combinations of double, single, and half (16-bit “block floating point”) precision. The library also includes auxiliary routines necessary for Hybrid Monte Carlo, such as HISQ link fattening, force terms and clover-field construction. Use of many GPUs in parallel is supported throughout, with communication handled by QMP or MPI. Several commonly-used packages integrate support for QUDA as a compile-time option, including Chroma, MILC, CPS, and BQCD.


Released version

The latest release can be downloaded from the GitHub releases page.

The master branch of our GitHub repository points to the latest released version. Make sure to checkout the master branch as a fresh clone by default points to the latest development version.

Development version

The develop branch always contains the latest development version.

Note that while this branch receives some testing it might not always be stable and is subject to frequent changes.

Older versions

Older releases are also available from Github.


Documentation is improving but still not covers everything. Good points to start are

For those interested in QUDA’s internals, reference pages generated by doxygen are available for the current release.

Getting help

Reporting Issues

The preferred method for requesting help is to submit an issue, but this currently requires a (free) GitHub account. If reporting a bug, please be sure to specify which version of QUDA you’re using.


An alternative approach is to join us on our QUDA slack team.


If you find this software useful in your work, please cite:

M. A. Clark, R. Babich, K. Barros, R. Brower, and C. Rebbi, “Solving Lattice QCD systems of equations using mixed precision solvers on GPUs,” Comput. Phys. Commun. 181, 1517 (2010) [arXiv:0911.3191 [hep-lat]].

When taking advantage of multi-GPU support, please also cite:

R. Babich, M. A. Clark, B. Joo, G. Shi, R. C. Brower, and S. Gottlieb, “Scaling lattice QCD beyond 100 GPUs,” International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2011 [arXiv:1109.2935 [hep-lat]].

When taking advantage of adaptive multigrid, please also cite:

M. A. Clark, A. Strelchenko, M. Cheng, A. Gambhir, and R. Brower, “Accelerating Lattice QCD Multigrid on GPUs Using Fine-Grained Parallelization,” International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2016 [arXiv:1612.07873 [hep-lat]].

When taking advantage of block CG, please also cite:

M. A. Clark, A. Strelchenko, A. Vaquero, M. Wagner, and E. Weinberg, “Pushing Memory Bandwidth Limitations Through Efficient Implementations of Block-Krylov Space Solvers on GPUs,” To be published in Comput. Phys. Commun. (2018) [arXiv:1710.09745 [hep-lat]].

Acknowledgment: This material is based upon work supported in part by the U.S. Department of Energy under grants DE-FC02-06ER41440, DE-FC02-06ER41449, and DE-AC05-06OR23177; the National Science Foundation under grants DGE-0221680, PHY-0427646, PHY-0835713, OCI-0946441, and OCI-1060067; as well as the PRACE project funded in part by the EUs 7th Framework Programme (FP7/2007-2013) under grants RI-211528 and FP7-261557. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the Department of Energy, the National Science Foundation, or the PRACE project.