|
QUDA
0.9.0
|
Classes | |
| struct | DslashAsync |
| struct | DslashBasic |
| struct | DslashFactory |
| struct | DslashFusedExterior |
| struct | DslashFusedExteriorAsync |
| struct | DslashFusedGDR |
| struct | DslashFusedGDRRecv |
| struct | DslashFusedZeroCopy |
| struct | DslashFusedZeroCopyPack |
| struct | DslashFusedZeroCopyPackGDRRecv |
| struct | DslashGDR |
| struct | DslashGDRRecv |
| struct | DslashNC |
| struct | DslashPolicyImp |
| class | DslashPolicyTune |
| struct | DslashPthreads |
| struct | DslashZeroCopy |
| struct | DslashZeroCopyPack |
| struct | DslashZeroCopyPackGDRRecv |
Functions | |
| void | issueRecv (cudaColorSpinorField &input, const DslashCuda &dslash, cudaStream_t *stream, bool gdr) |
| This helper function simply posts all receives in all directions. More... | |
| void | issuePack (cudaColorSpinorField &in, const DslashCuda &dslash, int parity, MemoryLocation location, int packIndex) |
| This helper function simply posts the packing kernel needed for halo exchange. More... | |
| void | issueGather (cudaColorSpinorField &in, const DslashCuda &dslash) |
| This helper function simply posts the device-host memory copies of all halos in all dimensions and directions. More... | |
| template<typename T > | |
| int | getStreamIndex (const T &dslashParam) |
| Returns a stream index for posting the pack/scatters to. We desire a stream index that is not being used for peer-to-peer communication. This is used by the fused halo dslash kernels where we post all scatters to the same stream so we only have a single event to wait on before the exterior kernel is applied, and by the zero-copy dslash kernels where we want to post the packing kernel to an unused stream. More... | |
| bool | commsComplete (cudaColorSpinorField &in, const DslashCuda &dslash, int dim, int dir, bool gdr_send, bool gdr_recv, bool zero_copy_recv, bool async, int scatterIndex=-1) |
| Wrapper for querying if communication is finished in the dslash, and if it is take the appropriate action: More... | |
| template<typename T > | |
| void | completeDslash (const ColorSpinorField &in, const T &dslashParam) |
| Ensure that the dslash is complete. By construction, the dslash will have completed (or is in flight) on this process, however, we must also ensure that no local work begins until any communication in flight from this process to another has completed. This prevents a race condition where we could start updating the local buffers on a subsequent computation before we have finished sending. More... | |
| void | setMappedGhost (DslashCuda &dslash, cudaColorSpinorField &in, bool to_mapped) |
| Set the ghosts to the mapped CPU ghost buffer, or unsets if already set. Note this must not be called until after the interior dslash has been called, since sets the peer-to-peer ghost pointers, and this need to be done without the mapped ghost enabled. More... | |
| static std::vector< QudaDslashPolicy > | policies (static_cast< int >(QudaDslashPolicy::QUDA_DSLASH_POLICY_DISABLED), QudaDslashPolicy::QUDA_DSLASH_POLICY_DISABLED) |
| static std::vector< QudaP2PPolicy > | p2p_policies (static_cast< int >(QudaP2PPolicy::QUDA_P2P_POLICY_DISABLED), QudaP2PPolicy::QUDA_P2P_POLICY_DISABLED) |
| void | enable_policy (QudaDslashPolicy p) |
| void | disable_policy (QudaDslashPolicy p) |
Variables | |
| static bool | dslash_init = false |
| static int | config = 0 |
| static int | first_active_policy =static_cast<int>(QudaDslashPolicy::QUDA_DSLASH_POLICY_DISABLED) |
| static int | first_active_p2p_policy =static_cast<int>(QudaP2PPolicy::QUDA_P2P_POLICY_DISABLED) |
|
strong |
Definition at line 1779 of file dslash_policy.cuh.
|
strong |
| Enumerator | |
|---|---|
| QUDA_P2P_DEFAULT | |
| QUDA_P2P_COPY_ENGINE | |
| QUDA_P2P_REMOTE_WRITE | |
| QUDA_P2P_POLICY_DISABLED | |
Definition at line 1801 of file dslash_policy.cuh.
|
inline |
Wrapper for querying if communication is finished in the dslash, and if it is take the appropriate action:
| [in,out] | in | Field being commicated |
| [in] | dslash | The dslash object |
| [in] | dim | Dimension we are working on |
| [in] | dir | Direction we are working on |
| [in] | gdr_send | Whether GPU Direct RDMA is being used for sending |
| [in] | gdr_recv | Whether GPU Direct RDMA is being used for receiving |
| [in] | zero_copy_recv | Whether we are using zero-copy on the receive end (and hence do not need to do CPU->GPU copy) |
| [in] | async | Whether GPU Direct Async is being used |
| [in] | scatterIndex | The stream index used for posting the host-to-device memory copy in |
Definition at line 279 of file dslash_policy.cuh.
References comm_peer2peer_enabled(), dslash::commsEnd_h, dim, dslash_comms, dslash_copy, errorQuda, in, quda::Nstream, PROFILE, quda::QUDA_PROFILE_COMMS_QUERY, quda::QUDA_PROFILE_SCATTER, quda::QUDA_PROFILE_STREAM_WAIT_EVENT, quda::qudaStreamWaitEvent(), stream, and streams.
Referenced by anonymous_namespace{dslash_policy.cuh}::DslashBasic::operator()(), anonymous_namespace{dslash_policy.cuh}::DslashFusedExterior::operator()(), anonymous_namespace{dslash_policy.cuh}::DslashGDR::operator()(), anonymous_namespace{dslash_policy.cuh}::DslashFusedGDR::operator()(), anonymous_namespace{dslash_policy.cuh}::DslashGDRRecv::operator()(), anonymous_namespace{dslash_policy.cuh}::DslashFusedGDRRecv::operator()(), anonymous_namespace{dslash_policy.cuh}::DslashZeroCopyPack::operator()(), anonymous_namespace{dslash_policy.cuh}::DslashFusedZeroCopyPack::operator()(), anonymous_namespace{dslash_policy.cuh}::DslashZeroCopyPackGDRRecv::operator()(), anonymous_namespace{dslash_policy.cuh}::DslashFusedZeroCopyPackGDRRecv::operator()(), anonymous_namespace{dslash_policy.cuh}::DslashZeroCopy::operator()(), and anonymous_namespace{dslash_policy.cuh}::DslashFusedZeroCopy::operator()().


|
inline |
Ensure that the dslash is complete. By construction, the dslash will have completed (or is in flight) on this process, however, we must also ensure that no local work begins until any communication in flight from this process to another has completed. This prevents a race condition where we could start updating the local buffers on a subsequent computation before we have finished sending.
Definition at line 331 of file dslash_policy.cuh.
References comm_peer2peer_enabled(), dim, in, quda::Nstream, PROFILE, quda::QUDA_PROFILE_STREAM_WAIT_EVENT, quda::qudaStreamWaitEvent(), and streams.
Referenced by anonymous_namespace{dslash_policy.cuh}::DslashBasic::operator()(), anonymous_namespace{dslash_policy.cuh}::DslashPthreads::operator()(), anonymous_namespace{dslash_policy.cuh}::DslashFusedExterior::operator()(), anonymous_namespace{dslash_policy.cuh}::DslashGDR::operator()(), anonymous_namespace{dslash_policy.cuh}::DslashFusedGDR::operator()(), anonymous_namespace{dslash_policy.cuh}::DslashGDRRecv::operator()(), anonymous_namespace{dslash_policy.cuh}::DslashFusedGDRRecv::operator()(), anonymous_namespace{dslash_policy.cuh}::DslashZeroCopyPack::operator()(), anonymous_namespace{dslash_policy.cuh}::DslashFusedZeroCopyPack::operator()(), anonymous_namespace{dslash_policy.cuh}::DslashZeroCopyPackGDRRecv::operator()(), anonymous_namespace{dslash_policy.cuh}::DslashFusedZeroCopyPackGDRRecv::operator()(), and anonymous_namespace{dslash_policy.cuh}::DslashFusedZeroCopy::operator()().


| void anonymous_namespace{dslash_policy.cuh}::disable_policy | ( | QudaDslashPolicy | p | ) |
Definition at line 1891 of file dslash_policy.cuh.
References p, and policies().

| void anonymous_namespace{dslash_policy.cuh}::enable_policy | ( | QudaDslashPolicy | p | ) |
Definition at line 1887 of file dslash_policy.cuh.
References p, and policies().
Referenced by anonymous_namespace{dslash_policy.cuh}::DslashPolicyTune::DslashPolicyTune().


|
inline |
Returns a stream index for posting the pack/scatters to. We desire a stream index that is not being used for peer-to-peer communication. This is used by the fused halo dslash kernels where we post all scatters to the same stream so we only have a single event to wait on before the exterior kernel is applied, and by the zero-copy dslash kernels where we want to post the packing kernel to an unused stream.
Definition at line 240 of file dslash_policy.cuh.
References comm_peer2peer_enabled(), fused_exterior_ndeg_tm_dslash_cuda_gen::i, and index().
Referenced by anonymous_namespace{dslash_policy.cuh}::DslashFusedExterior::operator()(), anonymous_namespace{dslash_policy.cuh}::DslashZeroCopyPack::operator()(), anonymous_namespace{dslash_policy.cuh}::DslashFusedZeroCopyPack::operator()(), anonymous_namespace{dslash_policy.cuh}::DslashZeroCopyPackGDRRecv::operator()(), anonymous_namespace{dslash_policy.cuh}::DslashFusedZeroCopyPackGDRRecv::operator()(), anonymous_namespace{dslash_policy.cuh}::DslashZeroCopy::operator()(), and anonymous_namespace{dslash_policy.cuh}::DslashFusedZeroCopy::operator()().


|
inline |
This helper function simply posts the device-host memory copies of all halos in all dimensions and directions.
| [out] | in | Field that whose halos we are communicating |
| [in] | dslash | The dslash object |
Definition at line 205 of file dslash_policy.cuh.
References comm_peer2peer_enabled(), dslash_copy, dslash::dslashStart, event, dslash::gatherEnd, quda::getKernelPackT(), fused_exterior_ndeg_tm_dslash_cuda_gen::i, in, dslash::packEnd, PROFILE, quda::QUDA_PROFILE_EVENT_RECORD, quda::QUDA_PROFILE_GATHER, quda::QUDA_PROFILE_STREAM_WAIT_EVENT, quda::qudaEventRecord(), quda::qudaStreamWaitEvent(), and streams.
Referenced by anonymous_namespace{dslash_policy.cuh}::DslashBasic::operator()(), anonymous_namespace{dslash_policy.cuh}::DslashFusedExterior::operator()(), anonymous_namespace{dslash_policy.cuh}::DslashGDRRecv::operator()(), and anonymous_namespace{dslash_policy.cuh}::DslashFusedGDRRecv::operator()().


|
inline |
This helper function simply posts the packing kernel needed for halo exchange.
| [out] | in | Field that we are packing |
| [in] | dslash | The dslash object |
| [in] | parity | Field parity |
| [in] | location | Memory location where we are packing to
|
| [in] | packIndex | Stream index where the packing kernel will run |
Definition at line 165 of file dslash_policy.cuh.
References comm_peer2peer_enabled(), quda::Device, dim, dslash_pack_compute, errorQuda, quda::getKernelPackT(), quda::Host, fused_exterior_ndeg_tm_dslash_cuda_gen::i, in, deg_tm_dslash_cuda_gen::pack, dslash::packEnd, parity, PROFILE, QUDA_MAX_DIM, quda::QUDA_PROFILE_EVENT_RECORD, quda::QUDA_PROFILE_PACK_KERNEL, quda::qudaEventRecord(), quda::Remote, and streams.
Referenced by anonymous_namespace{dslash_policy.cuh}::DslashBasic::operator()(), anonymous_namespace{dslash_policy.cuh}::DslashFusedExterior::operator()(), anonymous_namespace{dslash_policy.cuh}::DslashGDR::operator()(), anonymous_namespace{dslash_policy.cuh}::DslashFusedGDR::operator()(), anonymous_namespace{dslash_policy.cuh}::DslashGDRRecv::operator()(), anonymous_namespace{dslash_policy.cuh}::DslashFusedGDRRecv::operator()(), anonymous_namespace{dslash_policy.cuh}::DslashZeroCopyPack::operator()(), anonymous_namespace{dslash_policy.cuh}::DslashFusedZeroCopyPack::operator()(), anonymous_namespace{dslash_policy.cuh}::DslashZeroCopyPackGDRRecv::operator()(), anonymous_namespace{dslash_policy.cuh}::DslashFusedZeroCopyPackGDRRecv::operator()(), anonymous_namespace{dslash_policy.cuh}::DslashZeroCopy::operator()(), and anonymous_namespace{dslash_policy.cuh}::DslashFusedZeroCopy::operator()().


|
inline |
This helper function simply posts all receives in all directions.
| [out] | input | Field that we are doing halo exchange |
| [in] | dslash | The dslash object |
| [in] | stream | Stream were the receive is being posted (effectively ignored) |
| [in] | gdr | Whether we are using GPU Direct RDMA or not |
Definition at line 146 of file dslash_policy.cuh.
References dslash_comms, fused_exterior_ndeg_tm_dslash_cuda_gen::i, PROFILE, quda::QUDA_PROFILE_COMMS_START, and stream.
Referenced by anonymous_namespace{dslash_policy.cuh}::DslashBasic::operator()(), anonymous_namespace{dslash_policy.cuh}::DslashFusedExterior::operator()(), anonymous_namespace{dslash_policy.cuh}::DslashGDR::operator()(), anonymous_namespace{dslash_policy.cuh}::DslashFusedGDR::operator()(), anonymous_namespace{dslash_policy.cuh}::DslashGDRRecv::operator()(), anonymous_namespace{dslash_policy.cuh}::DslashFusedGDRRecv::operator()(), anonymous_namespace{dslash_policy.cuh}::DslashZeroCopyPack::operator()(), anonymous_namespace{dslash_policy.cuh}::DslashFusedZeroCopyPack::operator()(), anonymous_namespace{dslash_policy.cuh}::DslashZeroCopyPackGDRRecv::operator()(), anonymous_namespace{dslash_policy.cuh}::DslashFusedZeroCopyPackGDRRecv::operator()(), anonymous_namespace{dslash_policy.cuh}::DslashZeroCopy::operator()(), and anonymous_namespace{dslash_policy.cuh}::DslashFusedZeroCopy::operator()().

|
static |
Referenced by anonymous_namespace{dslash_policy.cuh}::DslashPolicyTune::advanceAux(), anonymous_namespace{dslash_policy.cuh}::DslashPolicyTune::apply(), and anonymous_namespace{dslash_policy.cuh}::DslashPolicyTune::DslashPolicyTune().

|
static |
Referenced by anonymous_namespace{dslash_policy.cuh}::DslashPolicyTune::advanceAux(), anonymous_namespace{dslash_policy.cuh}::DslashPolicyTune::apply(), disable_policy(), anonymous_namespace{dslash_policy.cuh}::DslashPolicyTune::DslashPolicyTune(), and enable_policy().

|
inline |
Set the ghosts to the mapped CPU ghost buffer, or unsets if already set. Note this must not be called until after the interior dslash has been called, since sets the peer-to-peer ghost pointers, and this need to be done without the mapped ghost enabled.
| [in,out] | dslash | The dslash object |
| [in,out] | in | The ColorSpinorField source |
| [in] | to_mapped | Whether we are switching to mapped ghosts or not |
Definition at line 355 of file dslash_policy.cuh.
References errorQuda, in, and strcpy().
Referenced by anonymous_namespace{dslash_policy.cuh}::DslashZeroCopy::operator()(), and anonymous_namespace{dslash_policy.cuh}::DslashFusedZeroCopy::operator()().


|
static |
Definition at line 1881 of file dslash_policy.cuh.
Referenced by anonymous_namespace{dslash_policy.cuh}::DslashPolicyTune::apply(), anonymous_namespace{dslash_policy.cuh}::DslashPolicyTune::defaultTuneParam(), anonymous_namespace{dslash_policy.cuh}::DslashPolicyTune::DslashPolicyTune(), and anonymous_namespace{dslash_policy.cuh}::DslashPolicyTune::initTuneParam().
|
static |
Definition at line 1879 of file dslash_policy.cuh.
Referenced by anonymous_namespace{dslash_policy.cuh}::DslashPolicyTune::DslashPolicyTune().
|
static |
Definition at line 1885 of file dslash_policy.cuh.
Referenced by anonymous_namespace{dslash_policy.cuh}::DslashPolicyTune::advanceAux(), anonymous_namespace{dslash_policy.cuh}::DslashPolicyTune::defaultTuneParam(), anonymous_namespace{dslash_policy.cuh}::DslashPolicyTune::DslashPolicyTune(), and anonymous_namespace{dslash_policy.cuh}::DslashPolicyTune::initTuneParam().
|
static |
Definition at line 1883 of file dslash_policy.cuh.
Referenced by anonymous_namespace{dslash_policy.cuh}::DslashPolicyTune::advanceAux(), anonymous_namespace{dslash_policy.cuh}::DslashPolicyTune::defaultTuneParam(), anonymous_namespace{dslash_policy.cuh}::DslashPolicyTune::DslashPolicyTune(), and anonymous_namespace{dslash_policy.cuh}::DslashPolicyTune::initTuneParam().
1.8.14