I try to keep my promised schedule on as much as possible. Here is the detailed the first step discussion of my proposed Machine Learning Work-Flow, that is Data Preprocessing.
Data Preprocessing is an important step in which mostly aims to improve raw data quality before you dwell into the technical concerns. Even-though this step involves very easy tasks to do, without this, you might observe very false or even freaking results at the end.
I also stated at the work-flow that, Data Preprocessing is statistical job other than ML. By saying this, Data Preprocessing demands good data inference and analysis just before any possible decision you made. These components are not the subjects of a ML course but are for a Statistics. Hence, if you aim to be cannier at ML as a whole, do not ignore statistics.
We can divide Data Preprocessing into 5 different headings;
- Data Integration
- Data Cleaning
- Data Transformation
- Data Discretization
- Data Reduction