Latest news about Bitcoin and all cryptocurrencies. Your daily crypto news habit.
How to Initialize weights in a neural net so it performs well? — Super fast explanation for Xavier’s Random Weight Initialization
http://www.mdpi.com/1099-4300/19/3/101
We know that in a neural network, weights are initialized usually randomly and that kind of initialization takes fair / significant amount of repetitions to converge to the least loss and reach to the ideal weight matrix. The problem is, this kind of initialization is prone to vanishing or exploding gradient problems.
One way to reduce this problem is carefully choosing the random weight initialization. Xavier’s random weight initialization aka Xavier’s algorithm factors into the equation the size of the network (number of input and output neurons) and addresses these problems.
Xavier Glorot and Yoshua Bengio are the contributors for this concept of initializing better random weights. This not only reduces the chances for running into the gradient problems but also helps to converge to least error faster.
General ways to make it initialize better weights:
a) If you’re using ReLu activation function in the deep nets (I’m talking about the hidden layer’s output activation function) then:
- Generate random sample of weights from a Gaussian distribution having mean 0 and a standard deviation of 1.
- Multiply that sample with the square root of (2/ni). Where ni is number of input units for that layer.
b) Likewise if you’re using Tanh activation function :
- Generate random sample of weights from a Gaussian distribution having mean 0 and a standard deviation of 1.
- Multiply that sample with the square root of (1/ni). Where ni is number of input units for that layer.
So what is this Xavier’s initialization?
Only major difference in Xavier’s initialization is the output no term. We add the number of output units for that layer.
For Tanh:
- Generate random sample of weights from a Gaussian distribution having mean 0 and a standard deviation of 1.
- Multiply that sample with the square root of (1/(ni+no)). Where ni is number of input units, no is the number of output units for that layer respectively.
# python code is hereimport numpy as npW = np.random.rand((x_dim,y_dim))*np.sqrt(1/(ni+no))
Why does this initialization help prevent gradient problems?
This sort of initialization helps to set the weight matrix neither too bigger than 1, nor too smaller than 1. Thus it doesn’t explode or vanish gradients respectively.
I learnt this from Coursera’s Awesome Deep Learning Specialization: deeplearning.ai
Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization :https://www.coursera.org/learn/deep-neural-network/
Here is the original Paper:
Understanding the difficulty of training deep feedforward neural networksXavier Glorot, Yoshua Bengio ; PMLR 9:249–256
If you liked this article, then clap it up! :) Maybe a follow?
Connect with me on Social:
How to Initialize weights in a neural net so it performs well? was originally published in Hacker Noon on Medium, where people are continuing the conversation by highlighting and responding to this story.
Disclaimer
The views and opinions expressed in this article are solely those of the authors and do not reflect the views of Bitcoin Insider. Every investment and trading move involves risk - this is especially true for cryptocurrencies given their volatility. We strongly advise our readers to conduct their own research when making a decision.