202410091518
Status: #idea
Tags: #ai #deep_learning #transformers #llm #electrical_engineering
# Differential attention is based on differential amplifiers
"A differential amplifier is a type of electronic amplifier that amplifies the difference between two input voltages but presses any voltage common to the two inputs." (Wikipedia).
![[Pasted image 20241009212653.png]]
The output of the differential amplifier is given by the equation:
$ V_{\text{out}} = A (V_{\text{in}}^+ - V_{\text{in}}^-)$
where $A$ is the gain of the amplifier (i.e. how much it increases the amplitude of the signal).
Compares this to the differential attention equation:
$ (\text{softmax}(Q_1 K_1^T / \sqrt{d}) - \lambda \cdot \text{softmax}(Q_2 K_2^T / \sqrt{d})) \cdot V $
We can see that the value matrix $V$ is acting as the gain of the amplifier. That is, any row in the differential attention map that aligns with $V$ will be amplified, while rows that do not align will be suppressed. Each individual attention map fills the role of the incoming voltage to the amplifier.
---
# References
[[Differential Transformer]]