40 namespace threadblock {
    46   typename ThreadblockShape_,
    48   typename InstructionShape_,
    73       !(ThreadblockShape::kM % WarpShape::kM) &&
    74       !(ThreadblockShape::kM % WarpShape::kM), 
"Divisibility");
    78       ThreadblockShape::kM / WarpShape::kM,
    79       ThreadblockShape::kN / WarpShape::kN,
 Definition: output_tile_thread_map.h:228
Definition: aligned_buffer.h:35
Tuple defining point in output tile. 
Definition: output_tile_thread_map.h:57
Definition: default_thread_map_wmma_tensor_op.h:66
Epilogue for threadblock scoped GEMMs using Tensor Ops. 
Element_ Element
Definition: default_thread_map_wmma_tensor_op.h:59
Defines common types used for all GEMM-like operators. 
static int const kCount
Definition: include/cutlass/gemm/gemm.h:67
static int const kThreads
Number of participating threads. 
Definition: default_thread_map_wmma_tensor_op.h:84
static int const kPartitionsK
Definition: default_thread_map_wmma_tensor_op.h:58
Defines the size of an element in bits. 
Definition: numeric_types.h:42
InstructionShape_ InstructionShape
Definition: default_thread_map_wmma_tensor_op.h:57
static int const kElementsPerAccess
Definition: default_thread_map_wmma_tensor_op.h:60
Shape of a matrix multiply-add operation. 
Definition: include/cutlass/gemm/gemm.h:57
ThreadblockShape_ ThreadblockShape
Definition: default_thread_map_wmma_tensor_op.h:55
Defines the optimal thread map for Wmma TensorOp accumulator layouts. 
Definition: default_thread_map_wmma_tensor_op.h:53
WarpShape_ WarpShape
Definition: default_thread_map_wmma_tensor_op.h:56
static int const kWarpSize
Definition: default_thread_map_wmma_tensor_op.h:70
Defines layout functions used by TensorRef and derived classes for pitch-linear memory. 
static int const kTensorOpRows
Wmma Tensor Operations fundamentally perform operations on InstructionShape::kM rows. 
Definition: default_thread_map_wmma_tensor_op.h:69