Key points: 1. [[Differential attention reduces noise in the attention map]] 2. [[Differential attention improves performance on long contexts]] 3. [[Differential attention improves needle-in-haystack performance]] 4. [[Differential attention is based on differential amplifiers]]