Rule based classifier

A rule-based classifier is a technique for classifying records using a collection of “if…then…” rules. The rules for the model are represented in disjunctive normal form, $R = (r_{1} \cup r_{2} \cup r_{3}…)$, where $R$ is known as the rule set and $r_{i}$'s are the classification rules or disjuncts. Each classification rule can be expressed in the following way: $$r_{i}: (Condition_{i}) \to y_{i}$$ The left-hand side of the rule is called the rule antecedent or precondition. It contains a conjuction of attribute tests: $$Condition_{i} = (A_{1}\ op\ v_{1}) \cap (A_{2}\ op\ v_{2}) \cap … (A_{k}\ op\ v_{k})$$ where ($A_{j}, v_{j}$) is an attribute-value pair and $op$ is a logical operator chosen from the set {$=,\neq,<,>,\geq,\leq$}. Each attribute ($A_{j}\ op\ v_{j}$) is known as a conjunct. The right side of the rule is called the **rule consequent**, which contains a predicted class $y_{i}$.

Consider the following example of ruleset for vertebrate classification problem.

Quality of classification

The quality of classification can be evaluated using measures such as coverage and accuracy. Given a data set D and a classification rule $r: A \to y$, the coverage of the rule is defined as the fraction of records in D that trigger the rule $r$. On the other hand, it’s accuracy or confidence factor is defined as the fraction of records triggered by r whose class labels are equal to $y$. The formal definition of these measures are $$Coverage = \frac{|A|}{|D|}$$ and $$Accuracy = \frac{|A \cap y|}{|A|}$$ Where $|A|$ is the number of records that satisfy both the antecedent and consequent, and $|D|$ is the total number of records.

Property of rules

  • Mutually exclusive rules:- The rules in the set R are mutually exclusive if no two rules in R are triggered by the same record. This property ensures that every record is covered by atmost one rule in R.
  • Exhaustive rules:- A rule set R has exhaustive coverage if there is a rule for every combination of attribute values. This property ensures that every property is covered by atleast one rule in R.

Together, these properties ensure that every record can be covered by exactly one rule. If a rule set is not exhaustive, then a default rule, $r_{d}: () \to y_{d}$, must be added to cover the remaining cases. A default rule has an empty antecedent and is triggered when all other rules have failed. $y_{d}$ is known as the default class of training records not covered by the existing rules.

Ordered and unordered rules

If the rule set is mutually exclusive, then a record can be covered by several rules, some of which may predict conflicting classes. There are two ways to overcome this problem.

  • Ordered rules:- In this approach, the rules in a rule set are ordered in decreasing order of their priority, which can be defined in many ways (e.g., based on accuracy, coverage, total description length, or the order in which rules are generated). An ordered rule set is also known as a decision list. When a test record is presented, it is classified by the highest ranked rule that covers the record. This avoids the problem of having conflicting classes predicted by multiple classification rules.
  • Unordered rules:- This approach allows a test record to trigger multiple classification rules and considers the consequent of each rule as a vote for a particular case. The votes are then tallied to determine the class label of the test record. The record is usually assigned to the class that receives the higher number of votes. In some cases, the vote may be weighted by the rule’s accuracy. Using unordered rules to build a rule based classifier has both advantages and disadvantages.
    • Unordered rules are less susceptible to the errors caused by the wrong rule being selected to classify a test record.
    • Model building is also less expensive because the rules not have to be kept in sorted order.
    • Classifying a test record can be quite an expensive task because the attributes of the test record must be compared against the precondition of every rule in the rule set.

Rule ordering schemes

Rule ordering can be implemented on a rule-by-rule basis or on a class-by-class basis.

  • Rule-Based Ordering Scheme:- This approach orderes the individual rules by some rule quantity measure. This ordering scheme ensures that every test record is classified by the “best” rule covering it. A potential drawback of this scheme is that lower-ranked rules are much harder to interpret because they assume the negation of the rules preceding them.
  • Class-based Ordering Scheme:- In this approach, rules that belong to the same class appear together in the rule set R. The rules are then collectively sorted on the basis of their class information. The relative ordering among the rules from the same class is not important; as long as one of the rules fires, the class will be assigned to the test record. This make rule interpretation slightly easier. However, it is possible for a high quality rule to be overlooked in favor of an inferior rule that happens to predict the higher-ranked class.

Characterstics of rule based classifier

A rule based classifier has following charactersticks

  • The expressiveness of a rule set is almost equivalent to that of a decision tree because a decision tree can be represented by a set of mutually exclusive and and exhaustive rules. Both rule based and decision tree classifier create rectilinear partition of the attribute space and assign a class to each partition. Nevertheless, if the rule based classifier allows multiple rules to be triggered for a given record, then a more complex decision boundary can be constructed.
  • Rule-based classifier are generally used to produce descriptive models that are easier to interpret, but gives compareable performance to the decision tree classifier.
  • The class based ordering approach adopted by many rule-based classifiers (such as RIPPER) is well suited for handling data sets with imbalanced class distributions.