QUDA  1.0.0
Enumerations | Functions
dslash_index.cuh File Reference

Go to the source code of this file.

Enumerations

enum  IndexType { EVEN_X = 0, EVEN_Y = 1, EVEN_Z = 2, EVEN_T = 3 }
 

Functions

template<int dim, int nLayers, int face_num, typename Param >
static __device__ int indexFromFaceIndexExtended (int face_idx, const Param &param)
 Compute global extended checkerboard index from face index. The following indexing routines work for arbitrary (including odd) lattice dimensions. Specifically, we compute an index into the local volume from an index into the face. This is used by the Wilson-like face packing routines. More...
 
template<int dim, int nLayers, int face_num, typename Param >
static __device__ int indexFromFaceIndexStaggered (int face_idx_in, const Param &param)
 Compute global checkerboard index from face index. The following indexing routines work for arbitrary lattice dimensions (though perhaps not odd like thw Wilson variant?) Specifically, we compute an index into the local volume from an index into the face. This is used by the staggered-like face packing routines, and is different from the Wilson variant since here the halo depth is tranversed in a different order - here the halo depth is the faster running dimension. More...
 
template<int dim, int nLayers, int face_num, typename Param >
static __device__ int indexFromFaceIndexExtendedStaggered (int face_idx, const Param &param)
 Compute global extended checkerboard index from face index. The following indexing routines work for arbitrary lattice dimensions (though perhaps not odd like thw Wilson variant?) Specifically, we compute an index into the local volume from an index into the face. This is used by the staggered-like face packing routines, and is different from the Wilson variant since here the halo depth is tranversed in a different order - here the halo depth is the faster running dimension. More...
 
template<KernelType dim, int nLayers, int Dir, typename Param >
static __device__ void coordsFromFaceIndexStaggered (int x[], int idx, const Param &param)
 Compute the full-lattice coordinates from the input face index. This is used by the staggered halo update kernels. More...
 
template<int nDim, QudaPCType pc_type, IndexType idxType, typename T , typename Param >
static __device__ __forceinline__ void coordsFromIndex (int &idx, T *x, int &cb_idx, const Param &param)
 Compute coordinates from index into the checkerboard (used by the interior Dslash kernels). This is used by the Wilson-like interior update kernels, and can deal with 4-d or 5-d field and 4-d or 5-d preconditioning. More...
 
template<IndexType idxType, typename Int , typename Param >
static __device__ __forceinline__ void coordsFromIndex3D (int &idx, Int *const x, int &cb_idx, const Param &param)
 Compute coordinates from index into the checkerboard (used by the interior Dslash kernels). This is the variant used by the shared memory wilson dslash. More...
 
template<int dim, typename T >
static __device__ bool inBoundary (const int depth, const int coord[], const T X[])
 Compute whether the provided coordinate is within the halo region boundary of a given dimension. More...
 
template<typename T >
static __device__ bool isActive (const int threadDim, int offsetDim, int offset, const int y[], const int partitioned[], const T X[])
 Compute whether this thread should be active for updating the a given offsetDim halo. This is used by the fused halo region update kernels: here every thread has a prescribed dimension it is tasked with updating, but for the edges and vertices, the thread responsible for the entire update is the "greatest" one. Hence some threads may be labelled as a given dimension, but they have to update other dimensions too. Conversely, a given thread may be labeled for a given dimension, but if that thread lies at en edge or vertex, and we have partitioned a higher dimension, then that thread will cede to the higher thread. More...
 
template<int nDim, int nLayers, typename I , typename Param >
static __device__ void faceIndexFromCoords (int &face_idx, I *const x, int face_dim, const Param &param)
 Compute the face index from the lattice coordinates. More...
 
__device__ float __fast_pow (float a, int b)
 

Enumeration Type Documentation

◆ IndexType

enum IndexType
Enumerator
EVEN_X 
EVEN_Y 
EVEN_Z 
EVEN_T 

Definition at line 332 of file dslash_index.cuh.

Function Documentation

◆ __fast_pow()

__device__ float __fast_pow ( float  a,
int  b 
)
inline

Definition at line 626 of file dslash_index.cuh.

◆ coordsFromFaceIndexStaggered()

template<KernelType dim, int nLayers, int Dir, typename Param >
static __device__ void coordsFromFaceIndexStaggered ( int  x[],
int  idx,
const Param &  param 
)
inlinestatic

Compute the full-lattice coordinates from the input face index. This is used by the staggered halo update kernels.

Parameters
x[out]Coordinates we are computing
idx[in]Input checkerboard face index
[in]paramParameter struct with required meta data

Definition at line 265 of file dslash_index.cuh.

References quda::EXTERIOR_KERNEL_T, quda::EXTERIOR_KERNEL_X, quda::EXTERIOR_KERNEL_Y, quda::EXTERIOR_KERNEL_Z, and X.

◆ coordsFromIndex()

template<int nDim, QudaPCType pc_type, IndexType idxType, typename T , typename Param >
static __device__ __forceinline__ void coordsFromIndex ( int &  idx,
T *  x,
int &  cb_idx,
const Param &  param 
)
static

Compute coordinates from index into the checkerboard (used by the interior Dslash kernels). This is used by the Wilson-like interior update kernels, and can deal with 4-d or 5-d field and 4-d or 5-d preconditioning.

Parameters
idx[out]The full lattice coordinate
cb_idx[out]The checkboarded lattice coordinate
x[out]Coordinates we are computing
idx[in]Input checkerboarded face index
[in]paramParameter struct with required meta data

(X[0] & 1)

(X[1] & 1)

(X[2] & 1)

Definition at line 352 of file dslash_index.cuh.

References EVEN_X, EVEN_Y, EVEN_Z, QUDA_4D_PC, quda::s, and X.

Referenced by quda::neighborIndex().

Here is the caller graph for this function:

◆ coordsFromIndex3D()

template<IndexType idxType, typename Int , typename Param >
static __device__ __forceinline__ void coordsFromIndex3D ( int &  idx,
Int *const  x,
int &  cb_idx,
const Param &  param 
)
static

Compute coordinates from index into the checkerboard (used by the interior Dslash kernels). This is the variant used by the shared memory wilson dslash.

Parameters
[out]idxLinear index
[out]xCompute coordinates
[out]ch_idxLinear checkboard index
[in]paramParameter struct with required meta data

Definition at line 499 of file dslash_index.cuh.

References EVEN_X, and X.

◆ faceIndexFromCoords()

template<int nDim, int nLayers, typename I , typename Param >
static __device__ void faceIndexFromCoords ( int &  face_idx,
I *const  x,
int  face_dim,
const Param &  param 
)
inlinestatic

Compute the face index from the lattice coordinates.

Parameters
[in]face_idxFace index
[in]xLattice coordinates
[in]face_dimWhich dimension
[in]paramInput parameters
Returns
dimension this face_idx corresponds to

Definition at line 606 of file dslash_index.cuh.

◆ inBoundary()

template<int dim, typename T >
static __device__ bool inBoundary ( const int  depth,
const int  coord[],
const T  X[] 
)
inlinestatic

Compute whether the provided coordinate is within the halo region boundary of a given dimension.

Parameters
[in]depthDepth of halo
[in]coordCoordinates
[in]XLattice dimensions
Returns
True if in boundary, else false

Definition at line 529 of file dslash_index.cuh.

◆ indexFromFaceIndexExtended()

template<int dim, int nLayers, int face_num, typename Param >
static __device__ int indexFromFaceIndexExtended ( int  face_idx,
const Param &  param 
)
inlinestatic

Compute global extended checkerboard index from face index. The following indexing routines work for arbitrary (including odd) lattice dimensions. Specifically, we compute an index into the local volume from an index into the face. This is used by the Wilson-like face packing routines.

Parameters
[in]face_idxCheckerboarded face index
[in]paramParameter struct with required meta data
Returns
Global extended checkerboard coordinate

Definition at line 13 of file dslash_index.cuh.

References R, and X.

◆ indexFromFaceIndexExtendedStaggered()

template<int dim, int nLayers, int face_num, typename Param >
static __device__ int indexFromFaceIndexExtendedStaggered ( int  face_idx,
const Param &  param 
)
inlinestatic

Compute global extended checkerboard index from face index. The following indexing routines work for arbitrary lattice dimensions (though perhaps not odd like thw Wilson variant?) Specifically, we compute an index into the local volume from an index into the face. This is used by the staggered-like face packing routines, and is different from the Wilson variant since here the halo depth is tranversed in a different order - here the halo depth is the faster running dimension.

Parameters
[in]face_idx_inCheckerboarded face index
[in]paramParameter struct with required meta data
Returns
Global extended checkerboard coordinate

Definition at line 179 of file dslash_index.cuh.

References dims, R, V, and X.

◆ indexFromFaceIndexStaggered()

template<int dim, int nLayers, int face_num, typename Param >
static __device__ int indexFromFaceIndexStaggered ( int  face_idx_in,
const Param &  param 
)
inlinestatic

Compute global checkerboard index from face index. The following indexing routines work for arbitrary lattice dimensions (though perhaps not odd like thw Wilson variant?) Specifically, we compute an index into the local volume from an index into the face. This is used by the staggered-like face packing routines, and is different from the Wilson variant since here the halo depth is tranversed in a different order - here the halo depth is the faster running dimension.

Parameters
[in]face_idx_inCheckerboarded face index
[in]paramParameter struct with required meta data
Returns
Global checkerboard coordinate

Definition at line 110 of file dslash_index.cuh.

References dims, quda::s, and X.

◆ isActive()

template<typename T >
static __device__ bool isActive ( const int  threadDim,
int  offsetDim,
int  offset,
const int  y[],
const int  partitioned[],
const T  X[] 
)
inlinestatic

Compute whether this thread should be active for updating the a given offsetDim halo. This is used by the fused halo region update kernels: here every thread has a prescribed dimension it is tasked with updating, but for the edges and vertices, the thread responsible for the entire update is the "greatest" one. Hence some threads may be labelled as a given dimension, but they have to update other dimensions too. Conversely, a given thread may be labeled for a given dimension, but if that thread lies at en edge or vertex, and we have partitioned a higher dimension, then that thread will cede to the higher thread.

Parameters
[in]threadDimPrescribed dimension of this thread
[in]offsetDimThe dimension we are querying whether this thread should be responsible
[in]offsetThe size of the hop
[in]ySite coordinate
[in]partitionedArray of which dimensions have been partitioned
[in]XLattice dimensions
Returns
True if this thread is active

Definition at line 555 of file dslash_index.cuh.