Enumerations
enum	IndexType { EVEN_X = 0, EVEN_Y = 1, EVEN_Z = 2, EVEN_T = 3 }

Functions
template<int dim, int nLayers, int face_num, typename Param >
static __device__ int	indexFromFaceIndexExtended (int face_idx, const Param &param)
	Compute global extended checkerboard index from face index. The following indexing routines work for arbitrary (including odd) lattice dimensions. Specifically, we compute an index into the local volume from an index into the face. This is used by the Wilson-like face packing routines. More...

template<int dim, int nLayers, int face_num, typename Param >
static __device__ int	indexFromFaceIndexStaggered (int face_idx_in, const Param &param)
	Compute global checkerboard index from face index. The following indexing routines work for arbitrary lattice dimensions (though perhaps not odd like thw Wilson variant?) Specifically, we compute an index into the local volume from an index into the face. This is used by the staggered-like face packing routines, and is different from the Wilson variant since here the halo depth is tranversed in a different order - here the halo depth is the faster running dimension. More...

template<int dim, int nLayers, int face_num, typename Param >
static __device__ int	indexFromFaceIndexExtendedStaggered (int face_idx, const Param &param)
	Compute global extended checkerboard index from face index. The following indexing routines work for arbitrary lattice dimensions (though perhaps not odd like thw Wilson variant?) Specifically, we compute an index into the local volume from an index into the face. This is used by the staggered-like face packing routines, and is different from the Wilson variant since here the halo depth is tranversed in a different order - here the halo depth is the faster running dimension. More...

template<KernelType dim, int nLayers, int Dir, typename Param >
static __device__ void	coordsFromFaceIndexStaggered (int x[], int idx, const Param &param)
	Compute the full-lattice coordinates from the input face index. This is used by the staggered halo update kernels. More...

template<int nDim, QudaPCType pc_type, IndexType idxType, typename T , typename Param >
static __device__ __forceinline__ void	coordsFromIndex (int &idx, T *x, int &cb_idx, const Param &param)
	Compute coordinates from index into the checkerboard (used by the interior Dslash kernels). This is used by the Wilson-like interior update kernels, and can deal with 4-d or 5-d field and 4-d or 5-d preconditioning. More...

template<IndexType idxType, typename Int , typename Param >
static __device__ __forceinline__ void	coordsFromIndex3D (int &idx, Int *const x, int &cb_idx, const Param &param)
	Compute coordinates from index into the checkerboard (used by the interior Dslash kernels). This is the variant used by the shared memory wilson dslash. More...

template<int dim, typename T >
static __device__ bool	inBoundary (const int depth, const int coord[], const T X[])
	Compute whether the provided coordinate is within the halo region boundary of a given dimension. More...

template<typename T >
static __device__ bool	isActive (const int threadDim, int offsetDim, int offset, const int y[], const int partitioned[], const T X[])
	Compute whether this thread should be active for updating the a given offsetDim halo. This is used by the fused halo region update kernels: here every thread has a prescribed dimension it is tasked with updating, but for the edges and vertices, the thread responsible for the entire update is the "greatest" one. Hence some threads may be labelled as a given dimension, but they have to update other dimensions too. Conversely, a given thread may be labeled for a given dimension, but if that thread lies at en edge or vertex, and we have partitioned a higher dimension, then that thread will cede to the higher thread. More...

template<int nDim, int nLayers, typename I , typename Param >
static __device__ void	faceIndexFromCoords (int &face_idx, I *const x, int face_dim, const Param &param)
	Compute the face index from the lattice coordinates. More...

__device__ float	__fast_pow (float a, int b)

Enumeration Type Documentation

◆ IndexType

enum IndexType

Enumerator
EVEN_X
EVEN_Y
EVEN_Z
EVEN_T

Definition at line 332 of file dslash_index.cuh.

Function Documentation

◆ __fast_pow()

__device__ float __fast_pow	(	float	a,
		int	b
	)

inline

Definition at line 626 of file dslash_index.cuh.

◆ coordsFromFaceIndexStaggered()

template<KernelType dim, int nLayers, int Dir, typename Param >

static __device__ void coordsFromFaceIndexStaggered	(	int	x[],
		int	idx,
		const Param &	param
	)

inlinestatic

Compute the full-lattice coordinates from the input face index. This is used by the staggered halo update kernels.

Parameters

	x[out]	Coordinates we are computing
	idx[in]	Input checkerboard face index
[in]	param	Parameter struct with required meta data

Definition at line 265 of file dslash_index.cuh.

References quda::EXTERIOR_KERNEL_T, quda::EXTERIOR_KERNEL_X, quda::EXTERIOR_KERNEL_Y, quda::EXTERIOR_KERNEL_Z, and X.

◆ coordsFromIndex()

template<int nDim, QudaPCType pc_type, IndexType idxType, typename T , typename Param >

static __device__ __forceinline__ void coordsFromIndex	(	int &	idx,
		T *	x,
		int &	cb_idx,
		const Param &	param
	)

static

Compute coordinates from index into the checkerboard (used by the interior Dslash kernels). This is used by the Wilson-like interior update kernels, and can deal with 4-d or 5-d field and 4-d or 5-d preconditioning.

Parameters

	idx[out]	The full lattice coordinate
	cb_idx[out]	The checkboarded lattice coordinate
	x[out]	Coordinates we are computing
	idx[in]	Input checkerboarded face index
[in]	param	Parameter struct with required meta data

(X[0] & 1)

(X[1] & 1)

(X[2] & 1)

Definition at line 352 of file dslash_index.cuh.

References EVEN_X, EVEN_Y, EVEN_Z, QUDA_4D_PC, quda::s, and X.

Referenced by quda::neighborIndex().

Here is the caller graph for this function:

◆ coordsFromIndex3D()

template<IndexType idxType, typename Int , typename Param >

static __device__ __forceinline__ void coordsFromIndex3D	(	int &	idx,
		Int *const	x,
		int &	cb_idx,
		const Param &	param
	)

static

Compute coordinates from index into the checkerboard (used by the interior Dslash kernels). This is the variant used by the shared memory wilson dslash.

Parameters

[out]	idx	Linear index
[out]	x	Compute coordinates
[out]	ch_idx	Linear checkboard index
[in]	param	Parameter struct with required meta data

Definition at line 499 of file dslash_index.cuh.

References EVEN_X, and X.

◆ faceIndexFromCoords()

template<int nDim, int nLayers, typename I , typename Param >

static __device__ void faceIndexFromCoords	(	int &	face_idx,
		I *const	x,
		int	face_dim,
		const Param &	param
	)

inlinestatic

Compute the face index from the lattice coordinates.

Parameters

[in]	face_idx	Face index
[in]	x	Lattice coordinates
[in]	face_dim	Which dimension
[in]	param	Input parameters

Returns: dimension this face_idx corresponds to

Definition at line 606 of file dslash_index.cuh.

◆ inBoundary()

template<int dim, typename T >

static __device__ bool inBoundary	(	const int	depth,
		const int	coord[],
		const T	X[]
	)

inlinestatic

Compute whether the provided coordinate is within the halo region boundary of a given dimension.

Parameters

[in]	depth	Depth of halo
[in]	coord	Coordinates
[in]	X	Lattice dimensions

Returns: True if in boundary, else false

Definition at line 529 of file dslash_index.cuh.

◆ indexFromFaceIndexExtended()

template<int dim, int nLayers, int face_num, typename Param >

static __device__ int indexFromFaceIndexExtended	(	int	face_idx,
		const Param &	param
	)

inlinestatic

Compute global extended checkerboard index from face index. The following indexing routines work for arbitrary (including odd) lattice dimensions. Specifically, we compute an index into the local volume from an index into the face. This is used by the Wilson-like face packing routines.

Parameters

[in]	face_idx	Checkerboarded face index
[in]	param	Parameter struct with required meta data

Returns: Global extended checkerboard coordinate

Definition at line 13 of file dslash_index.cuh.

References R, and X.

◆ indexFromFaceIndexExtendedStaggered()

template<int dim, int nLayers, int face_num, typename Param >

static __device__ int indexFromFaceIndexExtendedStaggered	(	int	face_idx,
		const Param &	param
	)

inlinestatic

Compute global extended checkerboard index from face index. The following indexing routines work for arbitrary lattice dimensions (though perhaps not odd like thw Wilson variant?) Specifically, we compute an index into the local volume from an index into the face. This is used by the staggered-like face packing routines, and is different from the Wilson variant since here the halo depth is tranversed in a different order - here the halo depth is the faster running dimension.

Parameters

[in]	face_idx_in	Checkerboarded face index
[in]	param	Parameter struct with required meta data

Returns: Global extended checkerboard coordinate

Definition at line 179 of file dslash_index.cuh.

References dims, R, V, and X.

◆ indexFromFaceIndexStaggered()

template<int dim, int nLayers, int face_num, typename Param >

static __device__ int indexFromFaceIndexStaggered	(	int	face_idx_in,
		const Param &	param
	)

inlinestatic

Compute global checkerboard index from face index. The following indexing routines work for arbitrary lattice dimensions (though perhaps not odd like thw Wilson variant?) Specifically, we compute an index into the local volume from an index into the face. This is used by the staggered-like face packing routines, and is different from the Wilson variant since here the halo depth is tranversed in a different order - here the halo depth is the faster running dimension.

Parameters

[in]	face_idx_in	Checkerboarded face index
[in]	param	Parameter struct with required meta data

Returns: Global checkerboard coordinate

Definition at line 110 of file dslash_index.cuh.

References dims, quda::s, and X.

◆ isActive()

template<typename T >

static __device__ bool isActive	(	const int	threadDim,
		int	offsetDim,
		int	offset,
		const int	y[],
		const int	partitioned[],
		const T	X[]
	)

inlinestatic

Compute whether this thread should be active for updating the a given offsetDim halo. This is used by the fused halo region update kernels: here every thread has a prescribed dimension it is tasked with updating, but for the edges and vertices, the thread responsible for the entire update is the "greatest" one. Hence some threads may be labelled as a given dimension, but they have to update other dimensions too. Conversely, a given thread may be labeled for a given dimension, but if that thread lies at en edge or vertex, and we have partitioned a higher dimension, then that thread will cede to the higher thread.

Parameters

[in]	threadDim	Prescribed dimension of this thread
[in]	offsetDim	The dimension we are querying whether this thread should be responsible
[in]	offset	The size of the hop
[in]	y	Site coordinate
[in]	partitioned	Array of which dimensions have been partitioned
[in]	X	Lattice dimensions

Returns: True if this thread is active

Definition at line 555 of file dslash_index.cuh.

Enumerations

Functions

Enumeration Type Documentation

◆ IndexType

Function Documentation

◆ __fast_pow()

◆ coordsFromFaceIndexStaggered()

◆ coordsFromIndex()

◆ coordsFromIndex3D()

◆ faceIndexFromCoords()

◆ inBoundary()

◆ indexFromFaceIndexExtended()

◆ indexFromFaceIndexExtendedStaggered()

◆ indexFromFaceIndexStaggered()

◆ isActive()