**Sigmoid unit :**

**Tanh unit:**

**Rectified linear unit (ReLU):**

we call;

- as
**stepped sigmoid**

- as
**softplus**function

The **softplus** function can be approximated by **max function** (**or hard max **) ie . The max function is Continue Reading

Posts tagged with: machine learning

**Sigmoid unit :**

**Tanh unit:**

**Rectified linear unit (ReLU):**

we call;

- as
**stepped sigmoid**

- as
**softplus**function

The **softplus** function can be approximated by **max function** (**or hard max **) ie . The max function is Continue Reading

I've gathered the following from online research so far:

I've used Armadillo a little bit, and found the interface to be intuitive enough, and it was easy to locate binary packages for Ubuntu (and I'm assuming other Linux distros). I haven't compiled it from source, but my hope is that it wouldn't be too difficult. It meets most of my design criteria, and uses dense linear algebra. It can call LAPACK or MKL routines.

I've heard good things about Eigen, but haven't used it. It claims to be fast, uses templating, and supports dense linear algebra. It doesn't have LAPACK or BLAS as a dependency, but appears to be able to do everything that LAPACK can do (plus some things LAPACK can't). A lot of projects use Eigen, Continue Reading

**Using Stochastic Gradient instead of Batch Gradient**

Stochastic Gradient:

- faster
- more suitable to track changes in each step
- often results with better solution - it may finds different ways to different local minimums on cost function due to it fluctuation on weights -
- Most common way to implement NN learning.

Batch Gradient:

- Analytically more tractable for the way of its convergence
- Many acceleration techniques are suited to Batch L.
- More accurate convergence to local min. - again because of the fluctuation on weights in Stochastic method -

**Shuffling Examples**

- give the more informative instance to algorithm next as the learning step is going further - more informative instance means causing more cost or being unseen -
- Do not give successively instances from same class.

**Transformation of Inputs**

- Mean normalization of input variables around zero mean
- Scale input variables so that covariances are about the same unit length
- Diminish correlations between features as much as possible - since two correlated input may result to learn same function by different units that is redundant -

**What is anomaly detection? **It is the way of detecting a **outlier data** point among the other points that have a some kind of logical distribution. **Outlier** one is also **anomalous** point (**Figure 1**)

**What are the applications?**

- it is a way of detecting hacker activities on web applications or network connections by considering varying attributes of the present status. For example , an application can keep track of the user's inputs to website and the work load that he proposes to system. Considering these current attribute values detection system decide a particular fraud action and kick out the user if there is.**Fraud user activity detection**You might be governing a data center with vast amount of computers so it is really hard to check each computer regularly for any flaw. A anomaly detection system might be working by considering network connection parameters of the computers, CPU and Memory Loads, it detect any problem on computer. Continue Reading**Data center monitoring**-

K-means is the most primitive and easy to use clustering algorithm (also a Machine Learning algorithm).

There are 4 basic steps of K-means:

- Choose K different initial data points on instance space (as initial centroids) - centroid is the mean points of the clusters that overview the attributes of the classes-.
- Assign each object to the nearest centroid.
- After all the object are assigned, recalculate the centroids by taking the averages of the current classes (clusters)
- Do 2-3 until centroid are stabilized.

Caveats for K-means:

- Although it can be proved that the procedure will always terminate, the k-means algorithm does not necessarily find the most optimal configuration, corresponding to the global objective function minimum.

- The algorithm is also significantly sensitive to the initial randomly selected cluster centres. The k-means algorithm can be run multiple times to reduce this effect.

Here is the basic animation to show the intuition of K-means.

-**Convexity**, including convex optimization and formulation of problems as convex programs. Two important subsets of this are linear programming and proximal gradient-style optimization algorithms and formulations, which have a ridiculously vast array of applications for industrial engineering and machine learning.

-**Probabilistic modeling and inference**: Graphical models and max-entropy models are the most important, and have a vast array of applications in machine learning and more structured statistical modeling. Markov Chain Monte Carlo is a terrific and amazing algorithm with a great special case called Gibbs sampling - they both present almost generic methods of Continue Reading