![]() |
batmat
0.0.17
Batched linear algebra routines
|
Classes | |
| struct | KernelConfig |
Functions | |
| template<class T, class Abi, KernelConfig Conf, index_t RowsReg, index_t ColsReg, StorageOrder OA, StorageOrder OB, StorageOrder OC, StorageOrder OD> | |
| std::conditional_t< Conf.track_zeros, std::pair< index_t, index_t >, void > | gemm_diag_copy_microkernel (const uview< const T, Abi, OA > A, const uview< const T, Abi, OB > B, const std::optional< uview< const T, Abi, OC > > C, const uview< T, Abi, OD > D, const uview_vec< const T, Abi > d, const index_t k) noexcept |
| Generalized matrix multiplication D = C ± A⁽ᵀ⁾ diag(d) B⁽ᵀ⁾. Single register block. | |
| template<class T, class Abi, KernelConfig Conf, StorageOrder OA, StorageOrder OB, StorageOrder OC, StorageOrder OD> | |
| void | gemm_diag_copy_register (const view< const T, Abi, OA > A, const view< const T, Abi, OB > B, const std::optional< view< const T, Abi, OC > > C, const view< T, Abi, OD > D, view< const T, Abi > d) noexcept |
| Generalized matrix multiplication D = C ± A⁽ᵀ⁾ diag(d) B⁽ᵀ⁾. Using register blocking. | |
Variables | |
| template<class T, class Abi> | |
| constexpr index_t | ColsReg = RowsReg<T, Abi> |
| template<class T, class Abi, KernelConfig Conf, StorageOrder OA, StorageOrder OB, StorageOrder OC, StorageOrder OD> | |
| const constinit auto | gemm_diag_copy_lut |
| template<MatrixStructure Struc> | |
| constexpr auto | first_column |
| template<index_t ColsReg, MatrixStructure Struc> | |
| constexpr auto | last_column |
| template<class T, class Abi> | |
| constexpr index_t | RowsReg |
| Register block size of the matrix-matrix multiplication micro-kernels. | |
| struct batmat::linalg::micro_kernels::gemm_diag::KernelConfig |
| Class Members | ||
|---|---|---|
| bool | negate = false | |
| bool | track_zeros = false | |
| MatrixStructure | struc_C = MatrixStructure::General | |
|
noexcept |
Generalized matrix multiplication D = C ± A⁽ᵀ⁾ diag(d) B⁽ᵀ⁾. Single register block.
Definition at line 35 of file gemm-diag.tpp.
|
noexcept |
Generalized matrix multiplication D = C ± A⁽ᵀ⁾ diag(d) B⁽ᵀ⁾. Using register blocking.
Definition at line 108 of file gemm-diag.tpp.
|
constexpr |
Definition at line 36 of file gemm-diag.hpp.
|
inlineconstinit |
Definition at line 16 of file gemm-diag.tpp.
|
inlineconstexpr |
Definition at line 22 of file gemm-diag.tpp.
|
inlineconstexpr |
Definition at line 26 of file gemm-diag.tpp.
|
inlineconstexpr |
Register block size of the matrix-matrix multiplication micro-kernels.
AVX-512 has 32 vector registers, we use 25 registers for a 5×5 accumulator block of matrix C (leaving some registers for loading A and B):
AVX2 has 16 vector registers, we use 9 registers for a 3×3 accumulator block of matrix C (leaving some registers for loading A and B):
Assumes that the platform has at least 16 vector registers, we use 9 registers for a 3×3 accumulator block of matrix C (leaving some registers for loading A and B):
NEON has 32 vector registers, we use 16 registers for a 4×4 accumulator block of matrix C (leaving plenty of registers for loading A and B):
Definition at line 13 of file avx-512.hpp.