RMSProp

RMSProp (Root Mean Square Propagation) is an adaptive learning rate optimization algorithm for deep neural networks. It was proposed by Geoff Hinton in his Coursera course "Neural Networks for Machine Learning". RMSProp is an improvement over the gradient descent algorithm, which resolves some of its issues, like the slow convergence and its sensitivity to the initial learning rate.

The update of the weights in RMSProp is done as follows:

Where:

are the updated weights.
are the current weights.
is the learning rate.
is the running average of squared gradients at time step .
is a small number to prevent division by zero (usually on the order of 1e-8).
is the gradient at time step .

The running average at time step t, , is calculated as follows:

Where:

is a decay factor that controls the size of the window of past gradients that are averaged (typically set to 0.9).

This means that RMSProp uses a moving average of squared gradients to normalize the gradient itself. This has an effect similar to that of momentum, in that it can help to accelerate convergence in settings where this would otherwise be slow.

Advantages of RMSProp

Faster Convergence: RMSProp often results in faster convergence compared to traditional gradient descent. This is due to the adaptive learning rate, which can help to avoid slow convergence when the gradients are small.
Less sensitivity to initial learning rate: RMSProp is less sensitive to the choice of initial learning rate. This is because RMSProp adapts the learning rate based on the magnitude of recent gradients.
Efficient for non-stationary objectives: RMSProp performs well on online and non-stationary settings, as it uses a moving average of squared gradients.

Disadvantages of RMSProp

Hyperparameter tuning: Despite being less sensitive to the initial learning rate, RMSProp still requires tuning of the decay factor .
Lack of theoretical justification: While RMSProp has been found empirically to work well, there is a lack of theoretical understanding or justification for why this is the case.