Key points:
1. [[Differential attention reduces noise in the attention map]]
2. [[Differential attention improves performance on long contexts]]
3. [[Differential attention improves needle-in-haystack performance]]
4. [[Differential attention is based on differential amplifiers]]