Kohonen Learning Procedure K-Means vs Lloyd's K-means

K-means maybe the most common data quantization method, used widely for many different domain of problems. Even it relies on very simple idea, it proposes satisfying results in a computationally efficient environment.

Underneath of the formula of K-means optimization, the objective is to minimize the distance between data points to its closest centroid (cluster center). Here we can write the objective as;

    \[argmin sum_{i=1}^{k}sum_{x_j in S_i} ||x_j - mu_i||^2\]


is the closest centroid to instance



Continue Reading


Some Useful Machine Learning Libraries.

Especially, with the advent of many different and intricate Machine Learning algorithms, it is very hard to come up with your code to any problem. Therefore, the use of a library and its choice is imperative provision before you start the project. However, there are many different libraries having different quirks and rigs in different languages, even in multiple languages so that choice is not very straight forward as it seems.

Before you start, I strongly recommend you to experiment the library of your interest so as not to say ” Ohh Buda!” at the end. For being a simple guide, I will point some possible libraries and signify some of them as my choices with the reason behind.

Continue Reading


Share Research Data Sets via Torrent

I recognized a newbie but very bright idea today. The idea is to share academic data sets and papers via torrent. Especially, if you are working on big scale of data sets like ImageNet , having such a distributed approach is just delighting (albeit it presently does not include ImageNet) because in many cases of downloads, in the course of time your download speed starts to attenuate a very small values, even with additional download peers it gets worse. However, in such a torrent based system, it is on the other way around. If you are familiar to bit-torrent, you well know that as the data is distributed to many machines, you experienced faster download speed owing to the nature of torrent system.


Passing multiple arguments for Python multiprocessing.pool

Python is a very bright language that is used by variety of users and mitigates many of pain.

One of the core functionality of Python that I frequently use is multiprocessing module. It is very efficient way of distribute your computation  embarrassingly.

If you read about the module and got used, at some point you will realize, there is no way proposed to pass multiple arguments to parallelized function. Now, I will present a way to achieve in a very Pythonized way.

For our instance, we have two lists with same number of arguments but they need to be fed into the function which is pooling.

Here we have self cover code:

[gist]<script src=”https://gist.github.com/erogol/8776285.js“></script>[/gist]


Extracting a sub-vector at C++

Suppose you have a vector array at C++ and you want to extract a sub-vector given some range. There is a simple illustration one of the possible way to do.

[gist] <script src=”https://gist.github.com/erogol/8631128.js”></script> [/gist]



Fundamental Sort Algorithms in Python

As a rusty researcher at coding, I spend some to revise my algorithm knowledge. At some part, I coded basic sorting algorithms in Python as  a good and concise practice (even the result is not very efficient as C or C++). However you would like to check the code and clean up your memory or edit the code in some efficient manner. Continue Reading


Project Euler – Problem 14

Here is one again a very intricate problem from Project Euler. It has no solution sheet as oppose to the other problems at the site. Therefore there is no consensus on the best solution.

Below is the problem: (I really suggest you to observe some of the example sequences. It has really interesting behaviours. 🙂 )

The following iterative sequence is defined for the set of positive integers:

n → n/2 (n is even)
n → 3n + 1 (n is odd)

Using the rule above and starting with 13, we generate the following sequence: Continue Reading


Project Euler – Problem 13

Here we have another qualified problem from Project Euler. You might want to work out the problem before see my solution.

The basic idea of my solution is to not use all the digits of the given numbers, instead extract the part of the each number that is necessary to sum up to conclude the first 10 digits of the result. I try to explain my approach at the top of the source code with my lacking MATH english. If you have any problem for that part please leave me a comment. Continue Reading