QUDA
v1.1.0
A library for QCD on GPUs
|
#include <tune_quda.h>
Public Member Functions | |
bool | advanceBlockDim (TuneParam ¶m) const |
void | initTuneParam (TuneParam ¶m) const |
void | defaultTuneParam (TuneParam ¶m) const |
Public Member Functions inherited from quda::Tunable | |
Tunable () | |
virtual | ~Tunable () |
virtual TuneKey | tuneKey () const =0 |
virtual void | apply (const qudaStream_t &stream)=0 |
virtual void | preTune () |
virtual void | postTune () |
virtual int | tuningIter () const |
virtual std::string | paramString (const TuneParam ¶m) const |
virtual std::string | perfString (float time) const |
virtual bool | advanceTuneParam (TuneParam ¶m) const |
void | checkLaunchParam (TuneParam ¶m) |
CUresult | jitifyError () const |
CUresult & | jitifyError () |
Protected Member Functions | |
unsigned int | sharedBytesPerThread () const |
unsigned int | sharedBytesPerBlock (const TuneParam ¶m) const |
bool | tuneGridDim () const final |
unsigned int | minGridSize () const |
int | gridStep () const |
gridStep sets the step size when iterating the grid size in advanceGridDim. More... | |
unsigned int | maxBlockSize (const TuneParam ¶m) const |
Protected Member Functions inherited from quda::Tunable | |
virtual long long | flops () const =0 |
virtual long long | bytes () const |
virtual unsigned int | minThreads () const |
virtual bool | tuneAuxDim () const |
virtual bool | tuneSharedBytes () const |
virtual bool | advanceGridDim (TuneParam ¶m) const |
virtual unsigned int | maxGridSize () const |
virtual int | blockStep () const |
virtual int | blockMin () const |
virtual void | resetBlockDim (TuneParam ¶m) const |
unsigned int | maxBlocksPerSM () const |
Returns the maximum number of simultaneously resident blocks per SM. We can directly query this of CUDA 11, but previously this needed to be hand coded. More... | |
unsigned int | maxDynamicSharedBytesPerBlock () const |
Returns the maximum dynamic shared memory per block. More... | |
virtual unsigned int | maxSharedBytesPerBlock () const |
The maximum shared memory that a CUDA thread block can use in the autotuner. This isn't necessarily the same as maxDynamicSharedMemoryPerBlock since that may need explicit opt in to enable (by calling setMaxDynamicSharedBytes for the kernel in question). If the CUDA kernel in question does this opt in then this function can be overloaded to return maxDynamicSharedBytesPerBlock. More... | |
virtual bool | advanceSharedBytes (TuneParam ¶m) const |
virtual bool | advanceAux (TuneParam ¶m) const |
int | writeAuxString (const char *format,...) |
bool | tuned () |
Whether the present instance has already been tuned or not. More... | |
Additional Inherited Members | |
Protected Attributes inherited from quda::Tunable | |
char | aux [TuneKey::aux_n] |
CUresult | jitify_error |
This derived class is for algorithms that deploy parity across the y dimension of the thread block with no shared memory tuning. The x threads will typically correspond to the checkboarded volume.
Definition at line 413 of file tune_quda.h.
|
inlinevirtual |
Reimplemented from quda::Tunable.
Definition at line 439 of file tune_quda.h.
|
inlinevirtual |
sets default values for when tuning is disabled
Reimplemented from quda::Tunable.
Definition at line 450 of file tune_quda.h.
|
inlineprotectedvirtual |
gridStep sets the step size when iterating the grid size in advanceGridDim.
Reimplemented from quda::Tunable.
Definition at line 428 of file tune_quda.h.
|
inlinevirtual |
Reimplemented from quda::Tunable.
Definition at line 445 of file tune_quda.h.
|
inlineprotectedvirtual |
The maximum block size in the x dimension is the total number of threads divided by the size of the y dimension. Since parity is local to the thread block in the y dimension, half the max threads in the x dimension.
Reimplemented from quda::Tunable.
Definition at line 436 of file tune_quda.h.
|
inlineprotectedvirtual |
Reimplemented from quda::Tunable.
Definition at line 427 of file tune_quda.h.
|
inlineprotectedvirtual |
Implements quda::Tunable.
Definition at line 418 of file tune_quda.h.
|
inlineprotectedvirtual |
Implements quda::Tunable.
Definition at line 417 of file tune_quda.h.
|
inlinefinalprotectedvirtual |
Reduction kernels require grid-size tuning, so enable this, and we mark as final to prevent a derived class from accidentally switching it off.
Reimplemented from quda::Tunable.
Definition at line 425 of file tune_quda.h.