QUDA v0.4.0
A library for QCD on GPUs
Public Member Functions | Protected Member Functions
Tunable Class Reference

#include <tune_quda.h>

Inheritance diagram for Tunable:

List of all members.

Public Member Functions

 Tunable ()
virtual ~Tunable ()
virtual TuneKey tuneKey () const =0
virtual void apply (const cudaStream_t &stream)=0
virtual void preTune ()
virtual void postTune ()
virtual int tuningIter () const
virtual std::string paramString (const TuneParam &param) const
virtual std::string perfString (float time) const
virtual void initTuneParam (TuneParam &param) const
virtual void defaultTuneParam (TuneParam &param) const
virtual bool advanceTuneParam (TuneParam &param) const

Protected Member Functions

virtual long long flops () const
virtual long long bytes () const
virtual int sharedBytesPerThread () const =0
virtual int sharedBytesPerBlock () const =0
virtual bool advanceGridDim (TuneParam &param) const
virtual bool advanceBlockDim (TuneParam &param) const
virtual bool advanceSharedBytes (TuneParam &param) const

Detailed Description

Definition at line 66 of file tune_quda.h.


Constructor & Destructor Documentation

Tunable::Tunable ( ) [inline]

Definition at line 133 of file tune_quda.h.

virtual Tunable::~Tunable ( ) [inline, virtual]

Definition at line 134 of file tune_quda.h.


Member Function Documentation

virtual bool Tunable::advanceBlockDim ( TuneParam param) const [inline, protected, virtual]

Reimplemented in DslashCuda.

Definition at line 91 of file tune_quda.h.

virtual bool Tunable::advanceGridDim ( TuneParam param) const [inline, protected, virtual]

Reimplemented in DslashCuda, and CloverCuda< sFloat, cFloat >.

Definition at line 78 of file tune_quda.h.

virtual bool Tunable::advanceSharedBytes ( TuneParam param) const [inline, protected, virtual]

The goal here is to throttle the number of thread blocks per SM by over-allocating shared memory (in order to improve L2 utilization, etc.). Note that:

  • On Fermi, requesting greater than 16 KB will switch the cache config, so we restrict ourselves to 16 KB for now.
  • On GT200 and older, kernel arguments are passed via shared memory, so available space may be smaller than 16 KB. We thus request the smallest amount of dynamic shared memory that guarantees throttling to a given number of blocks, in order to allow some extra leeway.

Definition at line 113 of file tune_quda.h.

virtual bool Tunable::advanceTuneParam ( TuneParam param) const [inline, virtual]

Definition at line 176 of file tune_quda.h.

virtual void Tunable::apply ( const cudaStream_t &  stream) [pure virtual]
virtual long long Tunable::bytes ( ) const [inline, protected, virtual]
virtual void Tunable::defaultTuneParam ( TuneParam param) const [inline, virtual]

sets default values for when tuning is disabled

Reimplemented in DslashCuda.

Definition at line 170 of file tune_quda.h.

virtual long long Tunable::flops ( ) const [inline, protected, virtual]
virtual void Tunable::initTuneParam ( TuneParam param) const [inline, virtual]

Reimplemented in DslashCuda.

Definition at line 160 of file tune_quda.h.

virtual std::string Tunable::paramString ( const TuneParam param) const [inline, virtual]

Reimplemented in DslashCuda, CloverCuda< sFloat, cFloat >, and TwistGamma5Cuda< sFloat >.

Definition at line 141 of file tune_quda.h.

virtual std::string Tunable::perfString ( float  time) const [inline, virtual]

Definition at line 150 of file tune_quda.h.

virtual void Tunable::postTune ( ) [inline, virtual]
virtual void Tunable::preTune ( ) [inline, virtual]
virtual int Tunable::sharedBytesPerBlock ( ) const [protected, pure virtual]
virtual int Tunable::sharedBytesPerThread ( ) const [protected, pure virtual]
virtual TuneKey Tunable::tuneKey ( ) const [pure virtual]
virtual int Tunable::tuningIter ( ) const [inline, virtual]

Definition at line 139 of file tune_quda.h.


The documentation for this class was generated from the following file:
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Defines