Posts tagged with: machine learning

"No free lunch" theorem lies about Random Forests

download

I've read a great paper by Delgado et al.  namely "Do we Need Hundreds of Classifiers to Solve Real World Classi cation Problems?" in which they compare 179 different classifiers from 17 families on 121 data sets composed by the whole UCI data base and some real-world problems. Classifiers are from R with and without caret pack, C and Matlab (I wish I could see Sklearn as well).

I really recommend you to read the paper in detail but I will share some of the highlights here. The most impressive result is the performance of Random Forests (RF) Implementations. For each dataset, RF is always at the top places. It gets 94.1%  of max accuracy and goes by 90% in the 84.3% of the data sets. Also, 3 out of 5 best classifiers are RF for any data set. This is pretty impressive, I guess. The runner-up is SVM with Gaussian kernel implemented in LibSVM and it archives 92.3% max accuracy. The paper points RF, SVM with Gaussian and Polynomial kernels, Extreme Learning Machines with Gaussian kernel, C5.0 and avNNet (a committe of MLPs implemented in R with caret package) as the top list algorithms after their experiments.

One shortcoming of the paper, from my beloved NN perspective,  is used Neural Network models are not very up-to-date versions such as drop-out, max-out networks. Therefore, it is hard to evaluate algorithms against these advance NN models. However, for anyone in the darn dark of algorithms, it is a quite good guideline that shows the power of RF and SVM against the others.

Share

ML Work-Flow (Part 4) – Sanity Checks and Data Splitting

SANITY CHECK

We are now one step ahead of Feature Extraction and we extracted statistically important (covariate) representation of the given raw data. Just after Feature Extraction, first thing we need to do is to check the values of the new representation. In general, people are keen on avoiding this and regarding it as a waste of time. However, I believe this is a serious mistake. As I stated before, a single  NULL value, or skewed representation might cause a very big pain at the end and it can leave you in very hazy conditions.

Let’s start our discussion. I list here my Sanity Check steps;

Continue Reading

Share

ML Work-Flow (Part 3) - Feature Extraction

In this post, I'll talk about the details of Feature Extraction (aka Feature Construction, Feature Aggregation …) in the path of successful ML. Finding good feature representations is a domain related process and it has an important influence on your final results. Even if you keep all the settings same, with different Feature Extraction methods you would observe drastically different results at the end. Therefore, choosing the correct Feature Extraction methodology requires painstaking work.

Feature Extraction is a process of conveying the given raw data into set of instance points embedded in a standardized, distinctive and machine understandable space. Standardized means comparable representations with same length; so you can compute similarities or differences of the instances that have initially very versatile structural differences (like different length documents). Distinctive means having different feature values for different class instances so that we can observe clusters of different classes in the new data space. Machine understandable representation is mostly the numerical representation of the given instances. You can understand any document by reading it but machines only understand semantics implied by the numbers. Continue Reading

Share

ML WORK-FLOW (Part2) - Data Preprocessing

I try to keep my promised schedule on as much as possible. Here is the detailed the first step discussion of my proposed Machine Learning Work-Flow, that is Data Preprocessing.

Data Preprocessing is an important step in which mostly aims to improve raw data quality before you dwell into the technical concerns. Even-though this step involves very easy tasks to do, without this, you might observe very false or even freaking results at the end.

I also stated at the work-flow that, Data Preprocessing is statistical job other than ML. By saying this, Data Preprocessing demands good data inference and analysis just before any possible decision you made. These components are not the subjects of a ML course but are for a Statistics. Hence, if you aim to be cannier at ML as a whole, do not ignore statistics.

We can divide Data Preprocessing into 5 different headings;

  1. Data Integration
  2. Data Cleaning
  3. Data Transformation
  4. Data Discretization
  5. Data Reduction

Continue Reading

Share

Machine Learning Work-Flow (Part 1)

So far, I am planning to write a serie of posts explaining a basic Machine Learning work-flow (mostly supervised). In this post, my target is to propose the bird-eye view, as I'll dwell into details at the latter posts explaining each of the components in detail. I decide to write this serie due to two reasons; the first reason is self-education -to get all my bits and pieces together after a period of theoretical research and industrial practice- the second is to present a naive guide to beginners and enthusiasts.

Below, we have the overview of the proposed work-flow. We have a color code indicating bases. Each box has a color tone from YELLOW to RED. The yellower the box, the more this component relies on Statistics knowledge base. As the box turns into red[gets darker], the component depends more heavily on Machine Learning knowledge base. By saying this, I also imply that, without good statistical understanding, we are not able to construct a convenient machine learning pipeline. As a footnote, this schema is changed by post-modernism of Representation Learning algorithms and I'll touch this at the latter posts.

 

Continue Reading

Share

I presented my Master Dissertation?

I am glad to be presented my master dissertation at the end, I collect valuable feedback from my community. Before, I shared what I have done so far for the thesis on the different posts. CMAP (Concept Map) and FAME (Face Association through Model Evolution) (I called it AME at the Thesis for being more generic) are basically two different method for mining visual concepts from noisy image sources such as Google Image Search or Flickr. You might prefer to look at the posts for details or I posted here also the presentation for the brief view of my work.

 

 

Share

FAME: Face Association through Model Evolution

Here, I summarize a new method called FAME for learning Face Models from noisy set of web images. I am studying this for my MS Thesis. To be a little intro to my thesis, the title is "Mining Web Images for Concept Learning" and it introduces two new methods for automatic learning of visual concepts from noisy web images. First proposed method is FAME and the other work was presented here before, that is namely ConceptMap and it is accepted for ECCV14 (self promotion :)).

Before I start, I should disclaim that FAME is not a fully furnished work and waiting your valuable comments. Please leave your statements about anything you find useful, ridiculous, awkward or great.

In this work, we grasp the problem of learning face models for public faces from images collected from web through querying a particular person name. Collected images are called weakly-labelled by the rough prescription of defined query. However, the data is very noisy even after face detection, with false detections or several irrelevant faces Continue Reading

Share

Our ECCV2014 work "ConceptMap: Mining noisy web data for concept learning"

---- I am living the joy of seeing my paper title on the list of accepted ECCV14 papers :). Seeing the outcome of your work makes worthwhile all your day to night efforts, REALLY!!!. Before start, I shall thank to my supervisor Pinar Duygulu for her great guidance.----

In this post, I would like to summarize the title work since I believe sometimes a friendly blog post might be more expressive than a solid scientific article.

"ConceptMap: Mining noisy web data for concept learning" proposes a pipeline so as to learn wide range of visual concepts by only defining a query to a image search engine. The idea is to query a concept at the service and download a huge bunch of images. Cluster images as removing the irrelevant instances. Learn a model from each of the clusters. At the end, each concept is represented by the ensemble of these classifiers. Continue Reading

Share

Large data really helps for Object Detection ?

I stumbled upon a interesting BMVC 2012 paper (Do We Need More Training Data or Better Models for Object Detection? -- Zhu, Xiangxin, Vondrick, Carl, Ramanan, Deva, Fowlkes, Charless). It is claming something contrary to current notion of big data theory that advocates benefit of large data-sets so as to learn better models with increasing training data size. Nevertheless, the paper states that large training data is not that much helpful for learning better models, indeed more data is maleficent without careful tuning of your system !! Continue Reading

Share