59 namespace threadblock {
    66   typename WarpMmaTensorOp_,
    80   using LayoutC = 
typename WarpMmaTensorOp::LayoutC;
    89     typename WarpMmaTensorOp::Shape,
   102     typename WarpMmaTensorOp::Shape,
   109     typename WarpMmaTensorOp::Shape,
   110     gemm::GemmShape<32, 32, 4>,
   117   static_assert(kSharedMemAlignment == 8, 
"Shared memory alignment must be 8B");
   120     typename OutputTileThreadMap::CompactedThreadMap,
   126   using Padding = 
typename WarpTileIterator::Padding;
 Templates implementing loading of tiles from pitch-linear rank=2 tensors. 
WarpMmaTensorOp_ WarpMmaTensorOp
Definition: default_epilogue_volta_tensor_op.h:74
Definition: aligned_buffer.h:35
static int const kPartitionsK
Definition: default_epilogue_volta_tensor_op.h:75
Epilogue for threadblock scoped GEMMs using Tensor Ops. 
This defines a "fragment" iterator for visiting the fragments of an accumulator tile that participate...
Defines common types used for all GEMM-like operators. 
static int const kElementsPerAccess
Definition: default_epilogue_volta_tensor_op.h:77
Functor performing conversion operations used by epilogues. 
typename WarpMmaTensorOp::LayoutC LayoutC
Definition: default_epilogue_volta_tensor_op.h:80
Shape_ Shape
Definition: default_epilogue_volta_tensor_op.h:73
Statically sized array of elements that accommodates all CUTLASS-supported numeric types and is safe ...
cutlass::epilogue::threadblock::SharedLoadIterator< typename OutputTileThreadMap::CompactedThreadMap, ElementAccumulator, kSharedMemAlignment > SharedLoadIterator
Definition: default_epilogue_volta_tensor_op.h:123
Functor performing linear combination operations used by epilogues. 
Defines the size of an element in bits. 
Definition: numeric_types.h:42
Template for reading and writing tiles of accumulators to shared memory. 
Definition: tile_iterator_volta_tensor_op.h:52
cutlass::epilogue::warp::FragmentIteratorVoltaTensorOp< typename WarpMmaTensorOp::Shape, gemm::GemmShape< 32, 32, 4 >, ElementAccumulator, LayoutC > AccumulatorFragmentIterator
Definition: default_epilogue_volta_tensor_op.h:106
static int const kSharedMemAlignment
Definition: default_epilogue_volta_tensor_op.h:115
cutlass::epilogue::warp::TileIteratorVoltaTensorOp< typename WarpMmaTensorOp::Shape, gemm::GemmShape< 32, 32, 4 >, ElementAccumulator, LayoutC > WarpTileIterator
Definition: default_epilogue_volta_tensor_op.h:113
Top-level include for all CUTLASS numeric types. 
Definition: fragment_iterator_volta_tensor_op.h:61
Shape of a matrix multiply-add operation. 
Definition: include/cutlass/gemm/gemm.h:57
Epilogue for threadblock scoped GEMMs using Tensor Ops. 
Defines sensible defaults for epilogues for TensorOps. 
Definition: default_epilogue_volta_tensor_op.h:71
Epilogue operator without splitk. 
Definition: epilogue.h:74
Epilogue for threadblock scoped GEMMs using Tensor Ops. 
Definition: epilogue/threadblock/predicated_tile_iterator.h:65
cutlass::epilogue::threadblock::PredicatedTileIterator< OutputTileThreadMap, ElementOutput > OutputTileIterator
Definition: default_epilogue_volta_tensor_op.h:99
typename WarpMmaTensorOp::ElementC ElementAccumulator
Definition: default_epilogue_volta_tensor_op.h:81
Defines the optimal thread map for TensorOp accumulator layouts. 
Definition: default_thread_map_volta_tensor_op.h:52
typename WarpTileIterator::Padding Padding
Hard-coded padding elements added. 
Definition: default_epilogue_volta_tensor_op.h:126
typename cutlass::epilogue::threadblock::DefaultThreadMapVoltaTensorOp< Shape, typename WarpMmaTensorOp::Shape, kPartitionsK, ElementOutput, kElementsPerAccess, ElementAccumulator >::Type OutputTileThreadMap
Definition: default_epilogue_volta_tensor_op.h:94
Definition: shared_load_iterator.h:61
typename OutputOp::ElementOutput ElementOutput
Definition: default_epilogue_volta_tensor_op.h:79
Functor performing reduction operations used by epilogues. 
Basic include for CUTLASS. 
OutputOp_ OutputOp
Definition: default_epilogue_volta_tensor_op.h:76