Huber Loss - Notes by Chris Hayduk

202509302030 Status: #idea Tags: #machine_learning #ai #deep_learning #reinforcement_learning # Huber Loss Huber Loss is a loss function that, like mean absolute error (MAE), is less sensitive to outlier than mean squared error (MSE). However, it improves upon MAE by being differentiable at 0 and having gradients that shrink as the error goes to 0. Mathematically, it is defined in the following way, $L_{\delta}(y, (f(x)) = \begin{cases} \frac{1}{2}(y-(f(x))^2 & \text{for } |y-f(x)|\leq \delta, \\ \delta \cdot (|y-f(x)| - \frac{1}{2}\delta) & \text{otherwise} \end{cases}$ So, we have the MSE loss when the error magnitude is less than or equal to our hyperparameter $\delta$. When the error is greater than $\delta$, we have a scaled and translated version of MAE. Notice that when $|y - f(x)| = \delta$, for the first equation we have $\frac{1}{2}(\delta)^2$ and for the second equation we have $\delta(|\delta| - \frac{1}{2}\delta) = \delta^2 - \frac{1}{2}\delta^2 = \frac{1}{2} \delta^2$ So, when the error is gt; - \delta$ and lt; \delta$, we have that the loss function is continuous and differentiable since it is MSE. When the error is lt; -\delta$ or gt; \delta$, we also have that the loss function is continuous and differentiable since it is a scaled & translated version of MAE, and we know that MAE is differentiable when the error is not 0. And since the two piecewise definitions of the function agree at their boundary, the whole function is continuous. Now to show differentiability, we need to ensure the slops match at the point where the error equals $\delta$. The derivative of the function when $|y-f(x)| \leq \delta$ is $y - f(x)$. The derivative when $|y - f(x)| > \delta$ is $\delta \; \cdot \; \text{sign}(y - f(x))$. Then, when $y-f(x) = \delta$ or $y - f(x) = -\delta$, the slopes match. Thus, the function is differentiable for all $x \in \mathbb{R}$ **Implementation Note:** quick-and-dirty way to implement Huber Loss is just to use MSE and clip the gradients (that is, do not let the gradient exceed some maximum magnitude). --- # References https://en.wikipedia.org/wiki/Huber_loss