Flexible Performant GEMM Kernels on GPUs in Native Julia

General Matrix Multiplication or GEMM kernels take center place in high performance
computing and machine learning. Recent NVIDIA GPUs include GEMM accelerators, such as
NVIDIA’s Tensor Cores. In this paper we show how it is possible to program these
acce… Read more

Similar

JuMPing at Gcd, with Julia

Recently, I was teaching my kids how to compute gcd(Greatest Common Divisor). Instead of just teaching the mechanics of calculation, I wanted to show them some interesting properties of gcd. (more…)

Read more »