| What is Multiple Classification Ripple Down Rule (MCRDR)? |
|
| 3. Evaluation |
|
| Experiment Design |
| The critical questions with MCRDR are the performance of the KBS - how well does it work on test cases; secondly, the size of the KBS, as there is no attempt to make the KB compact as with inductive machine learning methods; and thirdly the complexity of the KA task. |
| Two different domains are used for the evaluation (Table 1). One of data used were thyroid cases that had been run through the GARVAN-ES1 KBS to ensure a consistent classification. The other is "tic-tac-toe" data from Irvine machine learning data set. In the experiments, the system starts with an empty KB and it tests each case from case 1 to case 15,000 in GARVAN-ES1 and case 1 to case 700 in tic-tac-toe. The last 6822 cases in GARVAN-ES1 and 300 cases in tic-tac toe are test data. |
| Domains | Total Cases | Test Cases | Type of collection |
| GARVAN-ES1 | 21,822 | Last 6,822 | Historical collection of natural data |
| Tic-tac-toe | 21,822 | Last 300 | randomized data |
|
| Table 1 Systems built using these cases should have some resemblance to a real system built because data sets are in fact randomized data (GARVAN-ES1 is collection of historical data). The last 6822 cases and 300 cases in both domains are used as a test case data set. |
| To simplify the experimental design we chose to put new rules at the either the top or bottom of rule pathways. We imagined that putting rules at the top would be a worse situation with respect to the number of cornerstone cases to be seen. The data presented here covers only rules added at the top of pathways. |
| Results |
| 1) Error Rate |
| The error data are given by testing the cases in the data base. Note that the default error rate is 22.7% as the null classification is correct for 77.3% of cases in GARVAN-ES1, and are 62.6% for the 'tic tac toe' data set. The moderate expert, stupid expert and RDR clever expert all have higher error rates than the Induct or the clever MCRDR at the beginning. While the stupid expert remains at a high error rate, the other methods are all comparable to Induct once sufficient cases are seen. Note that, for Induct as well as the other methods, the error rate continues to fall as more cases are added to the training set. As shown by Catlett, this is a common phenomenon with induction applied to very large training sets (Catlett 1992). |
| 2) Cost of increased knowledge acquisition |
| Although the error rates for all but the stupid expert are reasonable, the critical question is whether these results are achieved at the cost of increased knowledge acquisition. The performance of the MCRDR systems is at least as good as an RDR system. A moderate expert RDR system is not shown here but in the earlier studies the final moderate expert KB was 50% bigger than the clever expert KB in the GARVAN-ES1 domain. Here the sizes are comparable. The final sizes are shown in Table 2. |
| Domain | Stupid Expert | Moderate Expert | Clever Expert | Clever Expert RDR | Induct RDR |
| GARVAN-ES1 | 2,300 | 974 | 805 | 924 | 332 |
| Tic-Tac-Toe | 233 | 99 | 75 | 114 | 29 |
|
|
Table 2. The number of rules made by an MCRDR system. |
| 3) Complexity of the knowledge acquisition task |
| The number of difference lists that the expert must select from to make a sufficiently precise rule is very small. This is shown in Table 4. If we exclude the stupid expert as not being representative of a human expert, then on average the expert only has to see 2-3 difference lists. It can be seen with a clever expert that the worst case is an average of 3 cases per rule, and with a moderate expert an average of 4 or 5 cases per rule in both domains. |
| GAVAN-ES1 | Average Case/Rule | Maximum Case/Rule |
| Moderate expert | 2.55 | 13 |
| Clever expert | 1.84 | 14 |
| Stupid expert | 3.22 | 17 |
|
| GAVAN-ES1 | Average Case/Rule | Maximum Case/Rule |
| Moderate expert | 2.25 | 8 |
| Clever expert | 1.87 | 7 |
| Stupid expert | 2.45 | 10 |
|
|
Table 2. The first column indicates the average number of cornerstone cases or difference lists seen by the expert in adding rules. The second column indicates the maximum number of cornerstone cases or difference lists seen by the expert in adding rules. |
|
|