Data Mining and Knowledge Discovery


Data mining is the integral part of knowledge discovery in databases (KDD) process, which is an overall process of converting raw data into useful information

Data Pre-Processing

The input data can be stored in various formats or it may reside in some central repository or be distributed at multiple sites. The purpose of data pre-processing is to tranform the raw input data into an appropriate format for subsequent analysis.

The steps involved in data pre-processing includes

  1. Combining data from various sources
  2. Cleaning data to remove noises and duplicate informations
  3. Select records and features that are useful for the data mining task in hand

Because of many ways data can be collected and stored, data pre-processing becomes the most time consuming task in the overall KDD process.

Data Post-Processing

Post-processing step ensures that only relevant results are incorporated into the decision support system. Data visualization is an example of Data Post-Processing which allows analysts to explore the data and data mining results from a variety of viewpoints.