**Using Stochastic Gradient instead of Batch Gradient**

Stochastic Gradient:

- faster
- more suitable to track changes in each step
- often results with better solution – it may finds different ways to different local minimums on cost function due to it fluctuation on weights –
- Most common way to implement NN learning.

Batch Gradient:

- Analytically more tractable for the way of its convergence
- Many acceleration techniques are suited to Batch L.
- More accurate convergence to local min. – again because of the fluctuation on weights in Stochastic method –

**Shuffling Examples**

- give the more informative instance to algorithm next as the learning step is going further – more informative instance means causing more cost or being unseen –
- Do not give successively instances from same class.

**Transformation of Inputs**

- Mean normalization of input variables around zero mean
- Scale input variables so that covariances are about the same unit length
- Diminish correlations between features as much as possible – since two correlated input may result to learn same function by different units that is redundant –