QUDA
1.0.0
|
Public Member Functions | |
CopyColorSpinor (Arg &arg, const ColorSpinorField &out, const ColorSpinorField &in, QudaFieldLocation location) | |
virtual | ~CopyColorSpinor () |
void | apply (const cudaStream_t &stream) |
TuneKey | tuneKey () const |
long long | flops () const |
long long | bytes () const |
Private Member Functions | |
unsigned int | sharedBytesPerThread () const |
unsigned int | sharedBytesPerBlock (const TuneParam ¶m) const |
bool | advanceSharedBytes (TuneParam ¶m) const |
bool | tuneGridDim () const |
unsigned int | minThreads () const |
![]() | |
TunableVectorY (unsigned int vector_length_y) | |
bool | advanceBlockDim (TuneParam ¶m) const |
void | initTuneParam (TuneParam ¶m) const |
void | defaultTuneParam (TuneParam ¶m) const |
void | resizeVector (int y) const |
void | resizeStep (int y) const |
![]() | |
Tunable () | |
virtual | ~Tunable () |
virtual void | preTune () |
virtual void | postTune () |
virtual int | tuningIter () const |
virtual std::string | paramString (const TuneParam ¶m) const |
virtual std::string | perfString (float time) const |
virtual bool | advanceTuneParam (TuneParam ¶m) const |
void | checkLaunchParam (TuneParam ¶m) |
CUresult | jitifyError () const |
CUresult & | jitifyError () |
virtual bool | tuneAuxDim () const |
virtual bool | tuneSharedBytes () const |
virtual bool | advanceGridDim (TuneParam ¶m) const |
virtual unsigned int | maxBlockSize (const TuneParam ¶m) const |
virtual unsigned int | maxGridSize () const |
virtual unsigned int | minGridSize () const |
virtual int | gridStep () const |
gridStep sets the step size when iterating the grid size in advanceGridDim. More... | |
virtual int | blockStep () const |
virtual int | blockMin () const |
virtual void | resetBlockDim (TuneParam ¶m) const |
unsigned int | maxBlocksPerSM () const |
For some reason this can't be queried from the device properties, so here we set set this. Based on Table 14 of the CUDA Programming Guide 10.0 (Technical Specifications per Compute Capability) More... | |
template<typename F > | |
void | setMaxDynamicSharedBytesPerBlock (F *func) const |
Enable the maximum dynamic shared bytes for the kernel "func" (values given by maxDynamicSharedBytesPerBlock()). More... | |
unsigned int | maxDynamicSharedBytesPerBlock () const |
This can't be correctly queried in CUDA for all architectures so here we set set this. Based on Table 14 of the CUDA Programming Guide 10.0 (Technical Specifications per Compute Capability). More... | |
virtual unsigned int | maxSharedBytesPerBlock () const |
The maximum shared memory that a CUDA thread block can use in the autotuner. This isn't necessarily the same as maxDynamicSharedMemoryPerBlock since that may need explicit opt in to enable (by calling setMaxDynamicSharedBytes for the kernel in question). If the CUDA kernel in question does this opt in then this function can be overloaded to return maxDynamicSharedBytesPerBlock. More... | |
virtual bool | advanceAux (TuneParam ¶m) const |
int | writeAuxString (const char *format,...) |
Private Attributes | |
Arg & | arg |
const ColorSpinorField & | meta |
const QudaFieldLocation | location |
![]() | |
unsigned int | vector_length_y |
unsigned int | step_y |
bool | tune_block_x |
![]() | |
char | aux [TuneKey::aux_n] |
CUresult | jitify_error |
Definition at line 162 of file copy_color_spinor.cuh.
|
inline |
Definition at line 175 of file copy_color_spinor.cuh.
References errorQuda, and quda::ColorSpinorField::GammaBasis().
|
inlinevirtual |
Definition at line 181 of file copy_color_spinor.cuh.
|
inlineprivatevirtual |
The goal here is to throttle the number of thread blocks per SM by over-allocating shared memory (in order to improve L2 utilization, etc.). We thus request the smallest amount of dynamic shared memory that guarantees throttling to a given number of blocks, in order to allow some extra leeway.
Reimplemented from quda::Tunable.
Definition at line 170 of file copy_color_spinor.cuh.
|
inlinevirtual |
Implements quda::Tunable.
Definition at line 183 of file copy_color_spinor.cuh.
References quda::arg(), quda::TuneParam::block, quda::copyColorSpinor(), quda::copyColorSpinorKernel(), getTuning(), getVerbosity(), quda::TuneParam::grid, QUDA_CPU_FIELD_LOCATION, quda::TuneParam::shared_bytes, and quda::tuneLaunch().
Referenced by quda::genericCopyColorSpinor().
|
inlinevirtual |
Reimplemented from quda::Tunable.
Definition at line 194 of file copy_color_spinor.cuh.
|
inlinevirtual |
Implements quda::Tunable.
Definition at line 193 of file copy_color_spinor.cuh.
|
inlineprivatevirtual |
Reimplemented from quda::Tunable.
Definition at line 172 of file copy_color_spinor.cuh.
References quda::ColorSpinorField::VolumeCB().
|
inlineprivatevirtual |
Reimplemented from quda::TunableVectorY.
Definition at line 169 of file copy_color_spinor.cuh.
|
inlineprivatevirtual |
Reimplemented from quda::TunableVectorY.
Definition at line 168 of file copy_color_spinor.cuh.
|
inlineprivatevirtual |
Reimplemented from quda::Tunable.
Definition at line 171 of file copy_color_spinor.cuh.
|
inlinevirtual |
Implements quda::Tunable.
Definition at line 192 of file copy_color_spinor.cuh.
References quda::LatticeField::VolString().
|
private |
Definition at line 163 of file copy_color_spinor.cuh.
|
private |
Definition at line 165 of file copy_color_spinor.cuh.
|
private |
Definition at line 164 of file copy_color_spinor.cuh.