Posts tagged with: machine learning

Genetic Algorithms and a Great Talk

I just watched the great talk given by Elad Katz. Simply after all my background on machine learning algorithms, that the capacity of evolutionary algorithms is simply stunning, even the basic idea is really simple compared to sophisticated counterparts that are used widely in the learning litreture. I am not glib here. If you not believe me just take a look at the talk, especially some of the demos in the mid-while of the talk.

 

Share

Simple Parallel Processing in Python

Here is a very concise view of Python multiprocessing module and its benefits. It is certainly important module for large scale data mining and machine learning projects and Kaggle like challenges. Therefore take a brief look to that slide to discover how to up-up your project cycle.

For more info refer to :
Share


What is different between Random Forests and Gradient Boosted Trees?

This a simple confusion for especially beginners or the practitioners of Machine Learning. Therefore, here I share a little space to talk about Random Forests and Gradient Boosted Trees.
To begin with, divide the perspective of differences in to two as algorithmic and practical.
Algorithmic difference is; Random Forests are trained with random sample of data (even more randomized cases available like feature randomization) and it trusts randomization to have better generalization performance on out of train set.
On the other spectrum, Gradient Boosted Trees algorithm additionally tries to find optimal linear combination of trees (assume final model is the weighted sum of predictions of individual trees) in relation to given train data. This extra tuning might be deemed as the difference. Note that, there are many variations of those algorithms as well.
At the practical side; owing to this tuning stage, Gradient Boosted Trees are more susceptible to jiggling data. This final stage makes GBT more likely to overfit therefore if the test cases are inclined to be so verbose compared to train cases this algorithm starts lacking. On the contrary, Random Forests are better to strain on overfitting although it is lacking on the other way around.
So the best choice depends to the case your have as always.
Share

Setting up cudamat in Ubuntu Machine

cudamat is a python library that makes you available to use CUDA benefits from Python instead of intricate low level approaches. This interface uses also

Before follow these steps please make sure that you installed a working CUDA library.

  1. Download cudamat from
  2. Compile with 'make' in the root downloaded folder /path/to/cudamat
  3. Set the environment variables to include cudamat in PYTHONPATH to be able to imported by any script. Run followings in the command line.
     PYTHONPATH=$PYTHONPATH:/path/to/cudamat
     export PYTHONPATH
  4. You are ready to use cudamat.

Here is a simple code you might test;

 # -*- coding: utf-8 -*-
 import numpy as np
 import cudamat as cm
 cm.cublas_init()
 # create two random matrices and copy them to the GPU
 a = cm.CUDAMatrix(np.random.rand(32, 256))
 b = cm.CUDAMatrix(np.random.rand(256, 32))
 # perform calculations on the GPU
 c = cm.dot(a, b)
 d = c.sum(axis = 0)
 # copy d back to the host (CPU) and print
 print d.asarray()

Note: If you get any other path problem, it would be related to CUDA installation therefore check environment parameters need to be set for CUDA.

Share

How is big data helpful for people and businesses?

All kind of data is useful for companies since they are able to understand and direct their customers as much as data is acquired. The main purpose of the business is that. Understand  customers so that you can make them happy, keep them alive in the company border. The only communication then is the data provided by all those people. They hire data analysis people and try to uncover some unknowns.
Continue Reading

Share



What Are Hot Topics for The Current Era of Machine Learning

1. Deep learning [5] seems to be getting the most press right now. It is a form of a Neural Network (with many neurons/layers). Articles are currently being published in the New Yorker [1] and the New York Times[2] on Deep Learning.

2. Combining Support Vector Machines (SVMs) and Stochastic Gradient Decent (SGD) is also interesting. SVMs are really interesting and useful because you can use the kernel trick [10] to transform your Continue Reading

Share