- Transformers Can Represent n-gram Language Models - https://arxiv.org/abs/2404.14994 - Understanding Transformers via N-Gram Statistics - https://www.researchgate.net/publication/382204056_Understanding_Transformers_via_N-Gram_Statistics - [[Decoder-Only Transformers as Differentiable n-gram Models]]