Basic Data Mining Tasks
Data mining tasks are majorly divided into two major categories.
Predictive Tasks:- The objective of predictive tasks is to predict the value of a particular attribute based on the values of other attributes. The attribute to be predicted is commonly known as the target or the dependent variable, while the attributes used for making the prediction as known as explanatory or independent variables.
Descriptive Tasks:- The objective of descriptive tasks is to derive patterns (correlations, trends, clusters, trajectories, and anamolies) that summarize the underlying relationships in data. Descriptive data mining tasks are often exploratory in nature and frequently require post-processing techniques to validate and explain the results.
Preditive modeling refers to the task of building a model for target variables as a function of the explanatory variables, Basically we build a data science model to predict target variable using explanatory variables.
There are two types of predictive modeling tasks
- Classification (for discrete target variables)
- Regression (for continues target variables)
Association analysis is used to discover patterns that describe strongly associated features in data. The discovered patterns are typically represented in the form of implication rules or feature subsets. Because of the exponential size of its search space, the goal of association analysis is to extract the most interesting patterns in the most efficient manner.
- Finding group of genes having similiar functionality
- Finding Web pages that are accessed together
- Understanding the relationship between different elements of Earth’s climate system
Cluster analysis seeks to find groups of closely related obvservations so that observations that belong to the same cluster are more similiar to each other than obvservations that belong to other clusters.
- Grouping sets of related customers
- finding areas of ocean that have significant impack on Earth’s climate
- Data compression
Anomaly detection is the task of identifying observations whose characterstics are significantly different from the rest of data. Such obvservations are known as anomalies or outliers. The goal of anomaly detection is to discover the real anomalies and avoid false labeling normal objects as anomalous.
- Fraud detection
- Network intrusion detection systems
- Finding unusual patterns of disease
- Finding Ecosystem disturbances