Gradient Descent: The Ultimate Optimiser
Working with any gradient-based machine learning algorithm involves the tedious task of tuning the optimizer's hyperparameters, such as its step size. Recent work has shown how the step size can itself be optimized alongside the model parameters by manually deriving expressions for "hypergradients" ahead of time. We show how to automatically compute such hypergradients with a simple and elegant modification to backpropagation.
Dynamically computing hyperparameters is especially useful when you need to optimize something "on-line" without being able to tweak hyperparameters ahead of time, or when the parameters of the model span a wide range of possible values (i.e. beyond [0..1]). In this talk we illustrate the effectiveness of our method for gradient-based bidirectional GUIs and constrained differential optimization. The only greek symbols in this talk are variables in example Kotlin code, and all demos are using old school AWT widgets.