QUDA
1.0.0
|
Classes | |
struct | DslashAsync |
struct | DslashBasic |
struct | DslashCommsPattern |
struct | DslashFactory |
struct | DslashFusedExterior |
struct | DslashFusedExteriorAsync |
struct | DslashFusedGDR |
struct | DslashFusedGDRRecv |
struct | DslashFusedZeroCopy |
struct | DslashFusedZeroCopyPack |
struct | DslashFusedZeroCopyPackGDRRecv |
struct | DslashGDR |
struct | DslashGDRRecv |
struct | DslashNC |
struct | DslashPolicyImp |
class | DslashPolicyTune |
struct | DslashZeroCopy |
struct | DslashZeroCopyPack |
struct | DslashZeroCopyPackGDRRecv |
Functions | |
template<typename Arg , typename Dslash > | |
void | setFusedParam (Arg ¶m, Dslash &dslash, const int *faceVolumeCB) |
template<typename Dslash > | |
void | issueRecv (cudaColorSpinorField &input, const Dslash &dslash, cudaStream_t *stream, bool gdr) |
This helper function simply posts all receives in all directions. More... | |
template<typename Dslash > | |
void | issuePack (cudaColorSpinorField &in, const Dslash &dslash, int parity, MemoryLocation location, int packIndex) |
This helper function simply posts the packing kernel needed for halo exchange. More... | |
template<typename Dslash > | |
void | issueGather (cudaColorSpinorField &in, const Dslash &dslash) |
This helper function simply posts the device-host memory copies of all halos in all dimensions and directions. More... | |
template<typename T > | |
int | getStreamIndex (const T &dslashParam) |
Returns a stream index for posting the pack/scatters to. We desire a stream index that is not being used for peer-to-peer communication. This is used by the fused halo dslash kernels where we post all scatters to the same stream so we only have a single event to wait on before the exterior kernel is applied, and by the zero-copy dslash kernels where we want to post the packing kernel to an unused stream. More... | |
template<typename Dslash > | |
bool | commsComplete (cudaColorSpinorField &in, const Dslash &dslash, int dim, int dir, bool gdr_send, bool gdr_recv, bool zero_copy_recv, bool async, int scatterIndex=-1) |
Wrapper for querying if communication is finished in the dslash, and if it is take the appropriate action: More... | |
template<typename T > | |
void | completeDslash (const ColorSpinorField &in, const T &dslashParam) |
Ensure that the dslash is complete. By construction, the dslash will have completed (or is in flight) on this process, however, we must also ensure that no local work begins until any communication in flight from this process to another has completed. This prevents a race condition where we could start updating the local buffers on a subsequent computation before we have finished sending. More... | |
template<typename Dslash > | |
void | setMappedGhost (Dslash &dslash, ColorSpinorField &in, bool to_mapped) |
Set the ghosts to the mapped CPU ghost buffer, or unsets if already set. Note this must not be called until after the interior dslash has been called, since sets the peer-to-peer ghost pointers, and this need to be done without the mapped ghost enabled. More... | |
void | enable_policy (QudaDslashPolicy p) |
void | disable_policy (QudaDslashPolicy p) |
Variables | |
int | it = 0 |
cudaEvent_t | packEnd [2] |
cudaEvent_t | gatherStart [Nstream] |
cudaEvent_t | gatherEnd [Nstream] |
cudaEvent_t | scatterStart [Nstream] |
cudaEvent_t | scatterEnd [Nstream] |
cudaEvent_t | dslashStart [2] |
Worker * | aux_worker |
bool | dslash_pack_compute |
bool | dslash_interior_compute |
bool | dslash_exterior_compute |
bool | dslash_comms |
bool | dslash_copy |
static cudaColorSpinorField * | inSpinor |
bool | dslash_policy_init |
int | first_active_policy |
int | first_active_p2p_policy |
std::vector< QudaDslashPolicy > | policies |
char | policy_string [TuneKey::aux_n] |
std::vector< QudaP2PPolicy > | p2p_policies |
|
strong |
Definition at line 1670 of file dslash_policy.cuh.
|
strong |
Enumerator | |
---|---|
QUDA_P2P_DEFAULT | |
QUDA_P2P_COPY_ENGINE | |
QUDA_P2P_REMOTE_WRITE | |
QUDA_P2P_POLICY_DISABLED |
Definition at line 1695 of file dslash_policy.cuh.
|
inline |
Wrapper for querying if communication is finished in the dslash, and if it is take the appropriate action:
[in,out] | in | Field being commicated |
[in] | dslash | The dslash object |
[in] | dim | Dimension we are working on |
[in] | dir | Direction we are working on |
[in] | gdr_send | Whether GPU Direct RDMA is being used for sending |
[in] | gdr_recv | Whether GPU Direct RDMA is being used for receiving |
[in] | zero_copy_recv | Whether we are using zero-copy on the receive end (and hence do not need to do CPU->GPU copy) |
[in] | async | Whether GPU Direct Async is being used |
[in] | scatterIndex | The stream index used for posting the host-to-device memory copy in |
Definition at line 253 of file dslash_policy.cuh.
References comm_peer2peer_enabled(), quda::cudaColorSpinorField::commsQuery(), quda::Dslash< Float >::Dagger(), errorQuda, quda::LatticeField::getIPCRemoteCopyEvent(), quda::Dslash< Float >::Nface(), quda::Nstream, PROFILE, quda::QUDA_PROFILE_COMMS_QUERY, quda::QUDA_PROFILE_SCATTER, quda::QUDA_PROFILE_STREAM_WAIT_EVENT, quda::qudaStreamWaitEvent(), quda::cudaColorSpinorField::scatter(), quda::stream, and streams.
Referenced by quda::dslash::DslashBasic< Dslash >::operator()(), quda::dslash::DslashFusedExterior< Dslash >::operator()(), quda::dslash::DslashGDR< Dslash >::operator()(), quda::dslash::DslashFusedGDR< Dslash >::operator()(), quda::dslash::DslashGDRRecv< Dslash >::operator()(), quda::dslash::DslashFusedGDRRecv< Dslash >::operator()(), quda::dslash::DslashZeroCopyPack< Dslash >::operator()(), quda::dslash::DslashFusedZeroCopyPack< Dslash >::operator()(), quda::dslash::DslashZeroCopyPackGDRRecv< Dslash >::operator()(), quda::dslash::DslashFusedZeroCopyPackGDRRecv< Dslash >::operator()(), quda::dslash::DslashZeroCopy< Dslash >::operator()(), and quda::dslash::DslashFusedZeroCopy< Dslash >::operator()().
|
inline |
Ensure that the dslash is complete. By construction, the dslash will have completed (or is in flight) on this process, however, we must also ensure that no local work begins until any communication in flight from this process to another has completed. This prevents a race condition where we could start updating the local buffers on a subsequent computation before we have finished sending.
Definition at line 304 of file dslash_policy.cuh.
References comm_peer2peer_enabled(), quda::LatticeField::getIPCCopyEvent(), quda::Nstream, PROFILE, quda::QUDA_PROFILE_STREAM_WAIT_EVENT, quda::qudaStreamWaitEvent(), and streams.
Referenced by quda::dslash::DslashBasic< Dslash >::operator()(), quda::dslash::DslashFusedExterior< Dslash >::operator()(), quda::dslash::DslashGDR< Dslash >::operator()(), quda::dslash::DslashFusedGDR< Dslash >::operator()(), quda::dslash::DslashGDRRecv< Dslash >::operator()(), quda::dslash::DslashFusedGDRRecv< Dslash >::operator()(), quda::dslash::DslashZeroCopyPack< Dslash >::operator()(), quda::dslash::DslashFusedZeroCopyPack< Dslash >::operator()(), quda::dslash::DslashZeroCopyPackGDRRecv< Dslash >::operator()(), quda::dslash::DslashFusedZeroCopyPackGDRRecv< Dslash >::operator()(), and quda::dslash::DslashFusedZeroCopy< Dslash >::operator()().
|
inline |
Definition at line 1765 of file dslash_policy.cuh.
References QUDA_DSLASH_POLICY_DISABLED.
|
inline |
Definition at line 1763 of file dslash_policy.cuh.
Referenced by quda::dslash::DslashPolicyTune< Dslash >::DslashPolicyTune().
|
inline |
Returns a stream index for posting the pack/scatters to. We desire a stream index that is not being used for peer-to-peer communication. This is used by the fused halo dslash kernels where we post all scatters to the same stream so we only have a single event to wait on before the exterior kernel is applied, and by the zero-copy dslash kernels where we want to post the packing kernel to an unused stream.
Definition at line 213 of file dslash_policy.cuh.
References comm_peer2peer_enabled(), and index().
Referenced by quda::dslash::DslashFusedExterior< Dslash >::operator()(), quda::dslash::DslashZeroCopyPack< Dslash >::operator()(), quda::dslash::DslashFusedZeroCopyPack< Dslash >::operator()(), quda::dslash::DslashZeroCopyPackGDRRecv< Dslash >::operator()(), quda::dslash::DslashFusedZeroCopyPackGDRRecv< Dslash >::operator()(), quda::dslash::DslashZeroCopy< Dslash >::operator()(), and quda::dslash::DslashFusedZeroCopy< Dslash >::operator()().
|
inline |
This helper function simply posts the device-host memory copies of all halos in all dimensions and directions.
[out] | in | Field that whose halos we are communicating |
[in] | dslash | The dslash object |
Definition at line 180 of file dslash_policy.cuh.
References quda::LatticeField::bufferIndex, comm_peer2peer_enabled(), quda::Dslash< Float >::Dagger(), quda::Dslash< Float >::dslashParam, quda::cudaColorSpinorField::gather(), quda::getKernelPackT(), quda::Dslash< Float >::Nface(), PROFILE, quda::QUDA_PROFILE_EVENT_RECORD, quda::QUDA_PROFILE_GATHER, quda::QUDA_PROFILE_STREAM_WAIT_EVENT, quda::qudaEventRecord(), quda::qudaStreamWaitEvent(), and streams.
Referenced by quda::dslash::DslashBasic< Dslash >::operator()(), quda::dslash::DslashFusedExterior< Dslash >::operator()(), quda::dslash::DslashGDRRecv< Dslash >::operator()(), and quda::dslash::DslashFusedGDRRecv< Dslash >::operator()().
|
inline |
This helper function simply posts the packing kernel needed for halo exchange.
[out] | in | Field that we are packing |
[in] | dslash | The dslash object |
[in] | parity | Field parity |
[in] | location | Memory location where we are packing to
|
[in] | packIndex | Stream index where the packing kernel will run |
Definition at line 139 of file dslash_policy.cuh.
References quda::arg(), quda::LatticeField::bufferIndex, comm_peer2peer_enabled(), quda::Dslash< Float >::Dagger(), quda::Device, quda::Dslash< Float >::dslashParam, errorQuda, quda::getKernelPackT(), quda::Host, quda::Dslash< Float >::Nface(), quda::pack(), quda::cudaColorSpinorField::pack(), parity, PROFILE, QUDA_MAX_DIM, quda::QUDA_PROFILE_EVENT_RECORD, quda::QUDA_PROFILE_PACK_KERNEL, quda::qudaEventRecord(), quda::Remote, and streams.
Referenced by quda::dslash::DslashBasic< Dslash >::operator()(), quda::dslash::DslashFusedExterior< Dslash >::operator()(), quda::dslash::DslashGDR< Dslash >::operator()(), quda::dslash::DslashFusedGDR< Dslash >::operator()(), quda::dslash::DslashGDRRecv< Dslash >::operator()(), quda::dslash::DslashFusedGDRRecv< Dslash >::operator()(), quda::dslash::DslashZeroCopyPack< Dslash >::operator()(), quda::dslash::DslashFusedZeroCopyPack< Dslash >::operator()(), quda::dslash::DslashZeroCopyPackGDRRecv< Dslash >::operator()(), quda::dslash::DslashFusedZeroCopyPackGDRRecv< Dslash >::operator()(), quda::dslash::DslashZeroCopy< Dslash >::operator()(), and quda::dslash::DslashFusedZeroCopy< Dslash >::operator()().
|
inline |
This helper function simply posts all receives in all directions.
[out] | input | Field that we are doing halo exchange |
[in] | dslash | The dslash object |
[in] | stream | Stream were the receive is being posted (effectively ignored) |
[in] | gdr | Whether we are using GPU Direct RDMA or not |
Definition at line 118 of file dslash_policy.cuh.
References quda::Dslash< Float >::Dagger(), quda::Dslash< Float >::dslashParam, quda::Dslash< Float >::Nface(), PROFILE, quda::QUDA_PROFILE_COMMS_START, quda::cudaColorSpinorField::recvStart(), and quda::stream.
Referenced by quda::dslash::DslashBasic< Dslash >::operator()(), quda::dslash::DslashFusedExterior< Dslash >::operator()(), quda::dslash::DslashGDR< Dslash >::operator()(), quda::dslash::DslashFusedGDR< Dslash >::operator()(), quda::dslash::DslashGDRRecv< Dslash >::operator()(), quda::dslash::DslashFusedGDRRecv< Dslash >::operator()(), quda::dslash::DslashZeroCopyPack< Dslash >::operator()(), quda::dslash::DslashFusedZeroCopyPack< Dslash >::operator()(), quda::dslash::DslashZeroCopyPackGDRRecv< Dslash >::operator()(), quda::dslash::DslashFusedZeroCopyPackGDRRecv< Dslash >::operator()(), quda::dslash::DslashZeroCopy< Dslash >::operator()(), and quda::dslash::DslashFusedZeroCopy< Dslash >::operator()().
|
inline |
Definition at line 81 of file dslash_policy.cuh.
References quda::Dslash< Float >::dslashParam, quda::EXTERIOR_KERNEL_ALL, and quda::Dslash< Float >::Nface().
Referenced by quda::dslash::DslashFusedExterior< Dslash >::operator()(), quda::dslash::DslashFusedGDR< Dslash >::operator()(), quda::dslash::DslashFusedGDRRecv< Dslash >::operator()(), quda::dslash::DslashFusedZeroCopyPack< Dslash >::operator()(), quda::dslash::DslashFusedZeroCopyPackGDRRecv< Dslash >::operator()(), and quda::dslash::DslashFusedZeroCopy< Dslash >::operator()().
|
inline |
Set the ghosts to the mapped CPU ghost buffer, or unsets if already set. Note this must not be called until after the interior dslash has been called, since sets the peer-to-peer ghost pointers, and this need to be done without the mapped ghost enabled.
[in,out] | dslash | The dslash object |
[in,out] | in | The ColorSpinorField source |
[in] | to_mapped | Whether we are switching to mapped ghosts or not |
Definition at line 328 of file dslash_policy.cuh.
References quda::Dslash< Float >::augmentAux(), quda::TuneKey::aux_n, quda::LatticeField::bufferIndex, comm_peer2peer_enabled_global(), quda::Dslash< Float >::dslashParam, errorQuda, quda::Dslash< Float >::getAux(), and quda::Dslash< Float >::setAux().
Referenced by quda::dslash::DslashZeroCopy< Dslash >::operator()(), and quda::dslash::DslashFusedZeroCopy< Dslash >::operator()().
Worker * quda::dslash::aux_worker |
Definition at line 87 of file dslash_quda.cu.
Referenced by quda::ShiftUpdate::apply(), quda::BiCGstabLUpdate::apply(), quda::createDslashEvents(), quda::DslashCoarseLaunch::operator()(), quda::BiCGstabL::operator()(), and quda::MultiShiftCG::operator()().
bool quda::dslash::dslash_comms |
Definition at line 66 of file dslash_quda.cu.
Referenced by quda::createDslashEvents().
bool quda::dslash::dslash_copy |
Definition at line 67 of file dslash_quda.cu.
Referenced by quda::createDslashEvents().
bool quda::dslash::dslash_exterior_compute |
Definition at line 65 of file dslash_quda.cu.
Referenced by quda::createDslashEvents().
bool quda::dslash::dslash_interior_compute |
Definition at line 64 of file dslash_quda.cu.
Referenced by quda::createDslashEvents().
bool quda::dslash::dslash_pack_compute |
Definition at line 63 of file dslash_quda.cu.
Referenced by quda::createDslashEvents().
bool quda::dslash::dslash_policy_init |
Definition at line 70 of file dslash_quda.cu.
Referenced by quda::createDslashEvents(), and quda::dslash::DslashNC< Dslash >::operator()().
cudaEvent_t quda::dslash::dslashStart |
Definition at line 60 of file dslash_quda.cu.
Referenced by quda::createDslashEvents(), and quda::destroyDslashEvents().
int quda::dslash::first_active_p2p_policy |
Definition at line 74 of file dslash_quda.cu.
Referenced by quda::dslash::DslashPolicyTune< Dslash >::advanceAux(), quda::createDslashEvents(), quda::dslash::DslashPolicyTune< Dslash >::defaultTuneParam(), quda::dslash::DslashPolicyTune< Dslash >::initTuneParam(), and quda::dslash::DslashNC< Dslash >::operator()().
int quda::dslash::first_active_policy |
Definition at line 73 of file dslash_quda.cu.
Referenced by quda::dslash::DslashPolicyTune< Dslash >::advanceAux(), quda::createDslashEvents(), quda::dslash::DslashPolicyTune< Dslash >::defaultTuneParam(), quda::dslash::DslashPolicyTune< Dslash >::DslashPolicyTune(), quda::dslash::DslashPolicyTune< Dslash >::initTuneParam(), and quda::dslash::DslashNC< Dslash >::operator()().
cudaEvent_t quda::dslash::gatherEnd |
Definition at line 57 of file dslash_quda.cu.
Referenced by quda::createDslashEvents(), quda::destroyDslashEvents(), quda::exchangeExtendedGhost(), and quda::shiftColorSpinorField().
cudaEvent_t quda::dslash::gatherStart |
Definition at line 56 of file dslash_quda.cu.
Referenced by quda::createDslashEvents(), and quda::destroyDslashEvents().
|
static |
Definition at line 34 of file dslash_policy.cuh.
int quda::dslash::it = 0 |
Definition at line 53 of file dslash_quda.cu.
Referenced by bdSVD().
std::vector< QudaP2PPolicy > quda::dslash::p2p_policies |
Definition at line 80 of file dslash_quda.cu.
Referenced by quda::createDslashEvents().
cudaEvent_t quda::dslash::packEnd |
Definition at line 55 of file dslash_quda.cu.
Referenced by quda::createDslashEvents(), quda::destroyDslashEvents(), and quda::shiftColorSpinorField().
std::vector< QudaDslashPolicy > quda::dslash::policies |
Definition at line 77 of file dslash_quda.cu.
Referenced by quda::createDslashEvents().
char quda::dslash::policy_string |
Definition at line 83 of file dslash_quda.cu.
Referenced by quda::createDslashEvents().
cudaEvent_t quda::dslash::scatterEnd |
Definition at line 59 of file dslash_quda.cu.
Referenced by quda::createDslashEvents(), quda::destroyDslashEvents(), and quda::shiftColorSpinorField().
cudaEvent_t quda::dslash::scatterStart |
Definition at line 58 of file dslash_quda.cu.
Referenced by quda::createDslashEvents(), and quda::destroyDslashEvents().