K-means maybe the most common data quantization method, used widely for many different domain of problems. Even it relies on very simple idea, it proposes satisfying results in a computationally efficient environment.
Underneath of the formula of K-means optimization, the objective is to minimize the distance between data points to its closest centroid (cluster center). Here we can write the objective as;
is the closest centroid to instance
Especially, with the advent of many different and intricate Machine Learning algorithms, it is very hard to come up with your code to any problem. Therefore, the use of a library and its choice is imperative provision before you start the project. However, there are many different libraries having different quirks and rigs in different languages, even in multiple languages so that choice is not very straight forward as it seems.
Before you start, I strongly recommend you to experiment the library of your interest so as not to say ” Ohh Buda!” at the end. For being a simple guide, I will point some possible libraries and signify some of them as my choices with the reason behind.
I recognized a newbie but very bright idea today. The idea is to share academic data sets and papers via torrent. Especially, if you are working on big scale of data sets like ImageNet , having such a distributed approach is just delighting (albeit it presently does not include ImageNet) because in many cases of downloads, in the course of time your download speed starts to attenuate a very small values, even with additional download peers it gets worse. However, in such a torrent based system, it is on the other way around. If you are familiar to bit-torrent, you well know that as the data is distributed to many machines, you experienced faster download speed owing to the nature of torrent system.
Python is a very bright language that is used by variety of users and mitigates many of pain.
One of the core functionality of Python that I frequently use is multiprocessing module. It is very efficient way of distribute your computation embarrassingly.
If you read about the module and got used, at some point you will realize, there is no way proposed to pass multiple arguments to parallelized function. Now, I will present a way to achieve in a very Pythonized way.
For our instance, we have two lists with same number of arguments but they need to be fed into the function which is pooling.
Here we have self cover code:
At that post, I try to illustrate one of the use case of comparison overriding for std::sort on top of a simple problem. Our problem is as follows:
Write a method to sort an array of strings so that all the anagrams are next to each other.
Suppose you have a vector array at C++ and you want to extract a sub-vector given some range. There is a simple illustration one of the possible way to do.
[gist] <script src=”https://gist.github.com/erogol/8631128.js”></script> [/gist]
As a rusty researcher at coding, I spend some to revise my algorithm knowledge. At some part, I coded basic sorting algorithms in Python as a good and concise practice (even the result is not very efficient as C or C++). However you would like to check the code and clean up your memory or edit the code in some efficient manner. Continue Reading
Here is one again a very intricate problem from Project Euler. It has no solution sheet as oppose to the other problems at the site. Therefore there is no consensus on the best solution.
Below is the problem: (I really suggest you to observe some of the example sequences. It has really interesting behaviours. 🙂 )
The following iterative sequence is defined for the set of positive integers:
n n/2 (n is even)
n 3n + 1 (n is odd)
Using the rule above and starting with 13, we generate the following sequence: Continue Reading
Here we have another qualified problem from Project Euler. You might want to work out the problem before see my solution.
The basic idea of my solution is to not use all the digits of the given numbers, instead extract the part of the each number that is necessary to sum up to conclude the first 10 digits of the result. I try to explain my approach at the top of the source code with my lacking MATH english. If you have any problem for that part please leave me a comment. Continue Reading