Data Mining and Knowledge Discovery
Data mining is the integral part of knowledge discovery in databases (KDD) process, which is an overall process of converting raw data into useful information
The input data can be stored in various formats or it may reside in some central repository or be distributed at multiple sites. The purpose of data pre-processing is to tranform the raw input data into an appropriate format for subsequent analysis.
The steps involved in data pre-processing includes
- Combining data from various sources
- Cleaning data to remove noises and duplicate informations
- Select records and features that are useful for the data mining task in hand
Because of many ways data can be collected and stored, data pre-processing becomes the most time consuming task in the overall KDD process.
Post-processing step ensures that only relevant results are incorporated into the decision support system. Data visualization is an example of Data Post-Processing which allows analysts to explore the data and data mining results from a variety of viewpoints.