Julia mixed precision GEMM codegen meets and exceeds CUBLAS

General Matrix Multiplication or GEMM kernels take center place in high performance
computing and machine learning. Recent NVIDIA GPUs include GEMM accelerators, such as
NVIDIA’s Tensor Cores. In this paper we show how it is possible to program these
acce… Read more

Similar

Syntactic loop fusion in Julia

After a lengthy design process and preliminary foundations in Julia 0.5, Julia 0.6 includes new facilities for writing code in the “vectorized” style (familiar from Matlab, Numpy, R, etcetera) while avoiding the overhead that this style of programming...

Read more »