cart
a study into patients after admission for a heart attack 19 variables collected during the first 24 hours for 215 patients (for those who survived the 24 hours) Question: Can the high risk (will not survive 30 days) patients be identified Impurity of a Node Need a measure of impurity of a node to help decide on how to split a node, or which node to split The measure should be at a maximum when a node is equally divided amongst all classes The impurity should be zero if the node is all one class Predictor variables can be continuous or categorical A Classification tree is created if the response variable is categorical A Regression tree is created if the response variable is continuous Large sample size for efficient split of the too many predictors Interaction between predictors can be identified Relative importance of predictors cannot be well identified Missing observations form a separate category Resubstitution Costs It is error of the tree estimat...