CART
CART is nonparametric
CART does not require variables to be selected in advance.
CART algorithm will itself identify the most significant variables and eleminate
non-significant ones.
CART results are invariant to monotone transformations of its independent variables.
Changing one or several variables to its logarithm or square root will not change
the structure of the tree. Only the splitting values (but not variables) in the
questions will be different.
CART can easily handle outliers.
Outliers can negatively affect the results of some statistical models, like Principal
Component Analysis (PCA) and linear regression. But the splitting algorithm of
CART will easily handle noisy data: CART will isolate the outliers in a separate
node.
Boston Housing is a classical dataset which can be easily used for regression trees. On
the one hand, we have 13 independent variables, on the other hand, there is response
variable - value of house (variable number 14).
Boston housing dataset consists of 506 observations and includes the following variables:
1. crime rate
2. percent of land zoned for large lots
3. percent of non-retail business
4. Charles river indicator, 1 if on Charles river, 0 otherwise
5. nitrogen oxide concentration
6. average number of rooms
7. percent built before 1980
8. weighted distance to employment centers
9. accessibility to radial highways
10. tax rate
11. pupil-teacher ration
12. percent black
13. percent lower status
14. median value of owner-occupied homes in thousands of dollars
Comments