![]() |
batmat
main
Batched linear algebra routines
|
Namespaces | |
| namespace | detail |
Classes | |
| struct | KernelConfig |
Functions | |
| template<class T, class Abi, KernelConfig Conf, index_t RowsReg, index_t ColsReg, StorageOrder OA, StorageOrder OB, StorageOrder OC, StorageOrder OD> | |
| void | gemm_copy_microkernel (const uview< const T, Abi, OA > A, const uview< const T, Abi, OB > B, const std::optional< uview< const T, Abi, OC > > C, const uview< T, Abi, OD > D, const index_t k) noexcept |
| Generalized matrix multiplication D = C ± A⁽ᵀ⁾ B⁽ᵀ⁾. Single register block. | |
| template<class T, class Abi, KernelConfig Conf, StorageOrder OA, StorageOrder OB, StorageOrder OC, StorageOrder OD> | |
| void | gemm_copy_register (const view< const T, Abi, OA > A, const view< const T, Abi, OB > B, const std::optional< view< const T, Abi, OC > > C, const view< T, Abi, OD > D) noexcept |
| Generalized matrix multiplication D = C ± A⁽ᵀ⁾ B⁽ᵀ⁾. Using register blocking. | |
Variables | |
| template<class T, class Abi> | |
| constexpr index_t | ColsReg = RowsReg<T, Abi> |
| template<class T, class Abi, KernelConfig Conf, StorageOrder OA, StorageOrder OB, StorageOrder OC, StorageOrder OD> | |
| const constinit decltype(detail::gemm_copy_lut< T, Abi, Conf, OA, OB, OC, OD >) | gemm_copy_lut = detail::gemm_copy_lut<T, Abi, Conf, OA, OB, OC, OD> |
| template<MatrixStructure Struc> | |
| constexpr auto | first_column |
| template<index_t ColsReg, MatrixStructure Struc> | |
| constexpr auto | last_column |
| template<class T, class Abi> | |
| constexpr index_t | RowsReg = 5 |
| Register block size of the matrix-matrix multiplication micro-kernels. | |
| template<class T, class Abi> requires (datapar::simd_size<T, Abi>::value * sizeof(T) > 32) | |
| constexpr index_t | RowsReg< T, Abi > = 3 |
| struct batmat::linalg::micro_kernels::gemm::KernelConfig |
| Class Members | ||
|---|---|---|
| bool | negate = false | |
| int | shift_A = 0 | |
| int | rotate_B = 0 | |
| int | rotate_C = 0 | |
| int | rotate_D = rotate_C | |
| int | mask_D = rotate_D | |
| MatrixStructure | struc_A = MatrixStructure::General | |
| MatrixStructure | struc_B = MatrixStructure::General | |
| MatrixStructure | struc_C = MatrixStructure::General | |
|
noexcept |
|
noexcept |
|
constexpr |
|
externconstinit |
|
inlineconstexpr |
|
inlineconstexpr |
|
inlineconstexpr |
Register block size of the matrix-matrix multiplication micro-kernels.
AVX-512 has 32 vector registers, we use 25 registers for a 5×5 accumulator block of matrix C (leaving some registers for loading A and B):
AVX2 has 16 vector registers, we use 9 registers for a 3×3 accumulator block of matrix C (leaving some registers for loading A and B):
Assumes that the platform has at least 16 vector registers, we use 9 registers for a 3×3 accumulator block of matrix C (leaving some registers for loading A and B):
NEON has 32 vector registers, we use 16 registers for a 4×4 accumulator block of matrix C (leaving plenty of registers for loading A and B):
Definition at line 13 of file avx-512.hpp.
|
inlineconstexpr |
Definition at line 17 of file avx-512.hpp.