Posts tagged with: machine learning

Deep Learning Resources

Here is a collection of resources about deep learning neural networks. I have not read all of the papers or watched all of the videos, so I cannot vouch for them.


  • Deep Learning:

    Deep Learning is a new area of Machine Learning research, which has been introduced with the objective of moving Machine Learning closer to one of its original goals: Artificial Intelligence.

    This website is intended to host a variety of resources and pointers to information about Deep Learning. In these pages you will find

    For the latest additions, including papers and software announcement, be sure to visit the Blog section of the website. Contact us if you have any comments or suggestions!

  • Geoffrey E. Hinton: Papers, including tutorials, and videos.
  • Jürgen Schmidhuber: Papers and links.

Genetic Algorithms and a Great Talk

I just watched the great talk given by Elad Katz. Simply after all my background on machine learning algorithms, that the capacity of evolutionary algorithms is simply stunning, even the basic idea is really simple compared to sophisticated counterparts that are used widely in the learning litreture. I am not glib here. If you not believe me just take a look at the talk, especially some of the demos in the mid-while of the talk.



Simple Parallel Processing in Python

Here is a very concise view of Python multiprocessing module and its benefits. It is certainly important module for large scale data mining and machine learning projects and Kaggle like challenges. Therefore take a brief look to that slide to discover how to up-up your project cycle.

For more info refer to :

What is different between Random Forests and Gradient Boosted Trees?

This a simple confusion for especially beginners or the practitioners of Machine Learning. Therefore, here I share a little space to talk about Random Forests and Gradient Boosted Trees.
To begin with, divide the perspective of differences in to two as algorithmic and practical.
Algorithmic difference is; Random Forests are trained with random sample of data (even more randomized cases available like feature randomization) and it trusts randomization to have better generalization performance on out of train set.
On the other spectrum, Gradient Boosted Trees algorithm additionally tries to find optimal linear combination of trees (assume final model is the weighted sum of predictions of individual trees) in relation to given train data. This extra tuning might be deemed as the difference. Note that, there are many variations of those algorithms as well.
At the practical side; owing to this tuning stage, Gradient Boosted Trees are more susceptible to jiggling data. This final stage makes GBT more likely to overfit therefore if the test cases are inclined to be so verbose compared to train cases this algorithm starts lacking. On the contrary, Random Forests are better to strain on overfitting although it is lacking on the other way around.
So the best choice depends to the case your have as always.

Setting up cudamat in Ubuntu Machine

cudamat is a python library that makes you available to use CUDA benefits from Python instead of intricate low level approaches. This interface uses also

Before follow these steps please make sure that you installed a working CUDA library.

  1. Download cudamat from
  2. Compile with ‘make’ in the root downloaded folder /path/to/cudamat
  3. Set the environment variables to include cudamat in PYTHONPATH to be able to imported by any script. Run followings in the command line.
     export PYTHONPATH
  4. You are ready to use cudamat.

Here is a simple code you might test;

 # -*- coding: utf-8 -*-
 import numpy as np
 import cudamat as cm
 # create two random matrices and copy them to the GPU
 a = cm.CUDAMatrix(np.random.rand(32, 256))
 b = cm.CUDAMatrix(np.random.rand(256, 32))
 # perform calculations on the GPU
 c =, b)
 d = c.sum(axis = 0)
 # copy d back to the host (CPU) and print
 print d.asarray()

Note: If you get any other path problem, it would be related to CUDA installation therefore check environment parameters need to be set for CUDA.


How is big data helpful for people and businesses?

All kind of data is useful for companies since they are able to understand and direct their customers as much as data is acquired. The main purpose of the business is that. Understand  customers so that you can make them happy, keep them alive in the company border. The only communication then is the data provided by all those people. They hire data analysis people and try to uncover some unknowns.
Continue Reading