Julia mixed precision GEMM codegen meets and exceeds CUBLAS

General Matrix Multiplication or GEMM kernels take center place in high performance
computing and machine learning. Recent NVIDIA GPUs include GEMM accelerators, such as
NVIDIA’s Tensor Cores. In this paper we show how it is possible to program these
acce… Read more

Similar

Exploring Lazy Evaluation with Julia

“Any sufficiently advanced technology is indistinguishable from magic,” go the famous words of Arthur C. Clarke. Often when we see magic, a small dose of knowledge is enough to wash away our wide-eyed awe and bring us back to sweet, comfortable cynicism. ... (more…)

Read more »