Classification trees in Gleam

Last update: 06/01/2010 15:46 -0700


The previous status was reported here. In summary, we support Insightful Miner (IM) prediction trees (derived from Classification or Regression Trees) using code in a package classification. After Bill Atwood makes his analysis available, a special subset worksheet is constructed with only the prediction analysis. This XML files is checked into cvs along with the package: See it here.

The connection to Gleam is via a class in the merit package,  ClassificationTree. It manages the set of (now 17) prediction trees, asking classification::Tree to parse them from the IM file, and arranging logic to invoke them, feeding the values from the AnalysisTuple, and setting five new variables in return: They are:

Name # trees Description Nominal cut
IMgoodCalProb  3 Probability that there was a good calorimeter energy measurement >0.5
IMvertexProb 2 Probability that the photon direction infered from a vertex is better than the highest energy track -
IMcoreProb 4 Probability that the event is not in the PSF tail >0.1
IMpsfErrPred 4 Prediction of the psf error -
IMgammaProb 4 Probability that the event is not background > 0.5  or 0.9

Since the previous setup, the Cal cut has become a lot more detailed, and the background rejection has been added, as a separate worksheet, intended to analyze the output of the PSF analysis/tail suppression. We combined the prediction portions of these worksheets into one, which can be seen here

The logic to calculate these is wired-in, mimicking the code in the IM worksheet, except that this code does not reject events, expecting that this would be done by further analysis of the 5 quantities above. (The cut on IMgammaProb is somewhat complicated, however.)

Current Status

All the above is operational, and part of Gleam v3r3p3. However, none is verified This requires comparing these values with the IM worksheet-generated variables, a task that took several days before, and which uncovered several errors..

Another Issue

Bill has devised a new pruner for selecting candidates for further study from large background runs. Since it is a IM prediction tree, there is no way to apply it in the Gleam environment, other than new specialized code as described above. Without this, he is unable to refine the current background rejection. 

Future Possibilities

The situation is rather inflexible and basically only accessible to those with IM and knowledge of the classification package. (Size of this set: one.) The CT technology is not restricted to IM users, since it is supported by R and S-Plus.

Another idea would be to use a new feature in IM version 3, which saves the prediction tree in simple ASCII format, and code to interpret it. This could replace the XML parsing code in classification::Tree. There are functions in R or S-Plus that allow