#include <tune_quda.h>

Inheritance diagram for quda::TunableLocalParityReduction:

Public Member Functions
bool	advanceBlockDim (TuneParam &param) const

void	initTuneParam (TuneParam &param) const

void	defaultTuneParam (TuneParam &param) const

Public Member Functions inherited from quda::Tunable
	Tunable ()

virtual	~Tunable ()

virtual TuneKey	tuneKey () const =0

virtual void	apply (const qudaStream_t &stream)=0

virtual void	preTune ()

virtual void	postTune ()

virtual int	tuningIter () const

virtual std::string	paramString (const TuneParam &param) const

virtual std::string	perfString (float time) const

virtual bool	advanceTuneParam (TuneParam &param) const

void	checkLaunchParam (TuneParam &param)

CUresult	jitifyError () const

CUresult &	jitifyError ()

Protected Member Functions
unsigned int	sharedBytesPerThread () const

unsigned int	sharedBytesPerBlock (const TuneParam &param) const

bool	tuneGridDim () const final

unsigned int	minGridSize () const

int	gridStep () const
	gridStep sets the step size when iterating the grid size in advanceGridDim. More...

unsigned int	maxBlockSize (const TuneParam &param) const

Protected Member Functions inherited from quda::Tunable
virtual long long	flops () const =0

virtual long long	bytes () const

virtual unsigned int	minThreads () const

virtual bool	tuneAuxDim () const

virtual bool	tuneSharedBytes () const

virtual bool	advanceGridDim (TuneParam &param) const

virtual unsigned int	maxGridSize () const

virtual int	blockStep () const

virtual int	blockMin () const

virtual void	resetBlockDim (TuneParam &param) const

unsigned int	maxBlocksPerSM () const
	Returns the maximum number of simultaneously resident blocks per SM. We can directly query this of CUDA 11, but previously this needed to be hand coded. More...

unsigned int	maxDynamicSharedBytesPerBlock () const
	Returns the maximum dynamic shared memory per block. More...

virtual unsigned int	maxSharedBytesPerBlock () const
	The maximum shared memory that a CUDA thread block can use in the autotuner. This isn't necessarily the same as maxDynamicSharedMemoryPerBlock since that may need explicit opt in to enable (by calling setMaxDynamicSharedBytes for the kernel in question). If the CUDA kernel in question does this opt in then this function can be overloaded to return maxDynamicSharedBytesPerBlock. More...

virtual bool	advanceSharedBytes (TuneParam &param) const

virtual bool	advanceAux (TuneParam &param) const

int	writeAuxString (const char *format,...)

bool	tuned ()
	Whether the present instance has already been tuned or not. More...

Additional Inherited Members
Protected Attributes inherited from quda::Tunable
char	aux [TuneKey::aux_n]

CUresult	jitify_error

Detailed Description

This derived class is for algorithms that deploy parity across the y dimension of the thread block with no shared memory tuning. The x threads will typically correspond to the checkboarded volume.

Definition at line 413 of file tune_quda.h.

Member Function Documentation

◆ advanceBlockDim()

bool quda::TunableLocalParityReduction::advanceBlockDim ( TuneParam & param ) const

inlinevirtual

Reimplemented from quda::Tunable.

Definition at line 439 of file tune_quda.h.

◆ defaultTuneParam()

void quda::TunableLocalParityReduction::defaultTuneParam ( TuneParam & param ) const

inlinevirtual

sets default values for when tuning is disabled

Reimplemented from quda::Tunable.

Definition at line 450 of file tune_quda.h.

◆ gridStep()

int quda::TunableLocalParityReduction::gridStep ( ) const

inlineprotectedvirtual

gridStep sets the step size when iterating the grid size in advanceGridDim.

Returns: Grid step size

Reimplemented from quda::Tunable.

Definition at line 428 of file tune_quda.h.

◆ initTuneParam()

void quda::TunableLocalParityReduction::initTuneParam ( TuneParam & param ) const

inlinevirtual

Reimplemented from quda::Tunable.

Definition at line 445 of file tune_quda.h.

◆ maxBlockSize()

unsigned int quda::TunableLocalParityReduction::maxBlockSize ( const TuneParam & param ) const

inlineprotectedvirtual

The maximum block size in the x dimension is the total number of threads divided by the size of the y dimension. Since parity is local to the thread block in the y dimension, half the max threads in the x dimension.

Reimplemented from quda::Tunable.

Definition at line 436 of file tune_quda.h.

◆ minGridSize()

unsigned int quda::TunableLocalParityReduction::minGridSize ( ) const

inlineprotectedvirtual

Reimplemented from quda::Tunable.

Definition at line 427 of file tune_quda.h.

◆ sharedBytesPerBlock()

unsigned int quda::TunableLocalParityReduction::sharedBytesPerBlock ( const TuneParam & param ) const

inlineprotectedvirtual

Implements quda::Tunable.

Definition at line 418 of file tune_quda.h.

◆ sharedBytesPerThread()

unsigned int quda::TunableLocalParityReduction::sharedBytesPerThread ( ) const

inlineprotectedvirtual

Implements quda::Tunable.

Definition at line 417 of file tune_quda.h.

◆ tuneGridDim()

bool quda::TunableLocalParityReduction::tuneGridDim ( ) const

inlinefinalprotectedvirtual

Reduction kernels require grid-size tuning, so enable this, and we mark as final to prevent a derived class from accidentally switching it off.

Reimplemented from quda::Tunable.

Definition at line 425 of file tune_quda.h.

The documentation for this class was generated from the following file:

quda/include/tune_quda.h

Public Member Functions

Protected Member Functions

Additional Inherited Members

Detailed Description

Member Function Documentation

◆ advanceBlockDim()

◆ defaultTuneParam()

◆ gridStep()

◆ initTuneParam()

◆ maxBlockSize()

◆ minGridSize()

◆ sharedBytesPerBlock()

◆ sharedBytesPerThread()

◆ tuneGridDim()