Meta AI Megabyte: Predicting Million-Byte Sequences with Multiscale Transformers

Autoregressive transformers are spectacular models for short sequences but
scale poorly to long sequences such as high-resolution images, podcasts, code,
or books. We proposed Megabyte, a multi-scale decoder architecture that enables
end-to-end differenti… Read more

Similar