|
template<int dim, int nLayers, int face_num, typename Param > |
static __device__ int | indexFromFaceIndexExtended (int face_idx, const Param ¶m) |
| Compute global extended checkerboard index from face index. The following indexing routines work for arbitrary (including odd) lattice dimensions. Specifically, we compute an index into the local volume from an index into the face. This is used by the Wilson-like face packing routines. More...
|
|
template<int dim, int nLayers, int face_num, typename Param > |
static __device__ int | indexFromFaceIndexStaggered (int face_idx_in, const Param ¶m) |
| Compute global checkerboard index from face index. The following indexing routines work for arbitrary lattice dimensions (though perhaps not odd like thw Wilson variant?) Specifically, we compute an index into the local volume from an index into the face. This is used by the staggered-like face packing routines, and is different from the Wilson variant since here the halo depth is tranversed in a different order - here the halo depth is the faster running dimension. More...
|
|
template<int dim, int nLayers, int face_num, typename Param > |
static __device__ int | indexFromFaceIndexExtendedStaggered (int face_idx, const Param ¶m) |
| Compute global extended checkerboard index from face index. The following indexing routines work for arbitrary lattice dimensions (though perhaps not odd like thw Wilson variant?) Specifically, we compute an index into the local volume from an index into the face. This is used by the staggered-like face packing routines, and is different from the Wilson variant since here the halo depth is tranversed in a different order - here the halo depth is the faster running dimension. More...
|
|
template<KernelType dim, int nLayers, int Dir, typename Param > |
static __device__ void | coordsFromFaceIndexStaggered (int x[], int idx, const Param ¶m) |
| Compute the full-lattice coordinates from the input face index. This is used by the staggered halo update kernels. More...
|
|
template<int nDim, QudaPCType pc_type, IndexType idxType, typename T , typename Param > |
static __device__ __forceinline__ void | coordsFromIndex (int &idx, T *x, int &cb_idx, const Param ¶m) |
| Compute coordinates from index into the checkerboard (used by the interior Dslash kernels). This is used by the Wilson-like interior update kernels, and can deal with 4-d or 5-d field and 4-d or 5-d preconditioning. More...
|
|
template<IndexType idxType, typename Int , typename Param > |
static __device__ __forceinline__ void | coordsFromIndex3D (int &idx, Int *const x, int &cb_idx, const Param ¶m) |
| Compute coordinates from index into the checkerboard (used by the interior Dslash kernels). This is the variant used by the shared memory wilson dslash. More...
|
|
template<int dim, typename T > |
static __device__ bool | inBoundary (const int depth, const int coord[], const T X[]) |
| Compute whether the provided coordinate is within the halo region boundary of a given dimension. More...
|
|
template<typename T > |
static __device__ bool | isActive (const int threadDim, int offsetDim, int offset, const int y[], const int partitioned[], const T X[]) |
| Compute whether this thread should be active for updating the a given offsetDim halo. This is used by the fused halo region update kernels: here every thread has a prescribed dimension it is tasked with updating, but for the edges and vertices, the thread responsible for the entire update is the "greatest" one. Hence some threads may be labelled as a given dimension, but they have to update other dimensions too. Conversely, a given thread may be labeled for a given dimension, but if that thread lies at en edge or vertex, and we have partitioned a higher dimension, then that thread will cede to the higher thread. More...
|
|
template<int nDim, int nLayers, typename I , typename Param > |
static __device__ void | faceIndexFromCoords (int &face_idx, I *const x, int face_dim, const Param ¶m) |
| Compute the face index from the lattice coordinates. More...
|
|
__device__ float | __fast_pow (float a, int b) |
|
template<int nDim, QudaPCType pc_type, IndexType idxType, typename T , typename Param >
static __device__ __forceinline__ void coordsFromIndex |
( |
int & |
idx, |
|
|
T * |
x, |
|
|
int & |
cb_idx, |
|
|
const Param & |
param |
|
) |
| |
|
static |
Compute coordinates from index into the checkerboard (used by the interior Dslash kernels). This is used by the Wilson-like interior update kernels, and can deal with 4-d or 5-d field and 4-d or 5-d preconditioning.
- Parameters
-
| idx[out] | The full lattice coordinate |
| cb_idx[out] | The checkboarded lattice coordinate |
| x[out] | Coordinates we are computing |
| idx[in] | Input checkerboarded face index |
[in] | param | Parameter struct with required meta data |
(X[0] & 1)
(X[1] & 1)
(X[2] & 1)
Definition at line 352 of file dslash_index.cuh.
References EVEN_X, EVEN_Y, EVEN_Z, QUDA_4D_PC, quda::s, and X.
Referenced by quda::neighborIndex().
template<int dim, int nLayers, int face_num, typename Param >
static __device__ int indexFromFaceIndexExtendedStaggered |
( |
int |
face_idx, |
|
|
const Param & |
param |
|
) |
| |
|
inlinestatic |
Compute global extended checkerboard index from face index. The following indexing routines work for arbitrary lattice dimensions (though perhaps not odd like thw Wilson variant?) Specifically, we compute an index into the local volume from an index into the face. This is used by the staggered-like face packing routines, and is different from the Wilson variant since here the halo depth is tranversed in a different order - here the halo depth is the faster running dimension.
- Parameters
-
[in] | face_idx_in | Checkerboarded face index |
[in] | param | Parameter struct with required meta data |
- Returns
- Global extended checkerboard coordinate
Definition at line 179 of file dslash_index.cuh.
References dims, R, V, and X.
template<int dim, int nLayers, int face_num, typename Param >
static __device__ int indexFromFaceIndexStaggered |
( |
int |
face_idx_in, |
|
|
const Param & |
param |
|
) |
| |
|
inlinestatic |
Compute global checkerboard index from face index. The following indexing routines work for arbitrary lattice dimensions (though perhaps not odd like thw Wilson variant?) Specifically, we compute an index into the local volume from an index into the face. This is used by the staggered-like face packing routines, and is different from the Wilson variant since here the halo depth is tranversed in a different order - here the halo depth is the faster running dimension.
- Parameters
-
[in] | face_idx_in | Checkerboarded face index |
[in] | param | Parameter struct with required meta data |
- Returns
- Global checkerboard coordinate
Definition at line 110 of file dslash_index.cuh.
References dims, quda::s, and X.