Differential attention is based on differential amplifiers

202410091518 Status: #idea Tags: #ai #deep_learning #transformers #llm #electrical_engineering # Differential attention is based on differential amplifiers "A differential amplifier is a type of electronic amplifier that amplifies the difference between two input voltages but presses any voltage common to the two inputs." (Wikipedia). ![[Pasted image 20241009212653.png]] The output of the differential amplifier is given by the equation: $ V_{\text{out}} = A (V_{\text{in}}^+ - V_{\text{in}}^-)$ where $A$ is the gain of the amplifier (i.e. how much it increases the amplitude of the signal). Compares this to the differential attention equation: $ (\text{softmax}(Q_1 K_1^T / \sqrt{d}) - \lambda \cdot \text{softmax}(Q_2 K_2^T / \sqrt{d})) \cdot V $ We can see that the value matrix $V$ is acting as the gain of the amplifier. That is, any row in the differential attention map that aligns with $V$ will be amplified, while rows that do not align will be suppressed. Each individual attention map fills the role of the incoming voltage to the amplifier. --- # References [[Differential Transformer]]