1

I am trying to use CART to analyse a data set whose each row is a segment, for example

Segment_ID  | Attribute_1   | Attribute_2   | Attribute_3   | Attribute_4 | Target  
1                  2               3              100              3         0.1  
2                  0               6              150              5         0.3
3                  0               3              200              6         0.56  
4                  1               4              103              4         0.23 

Each segment has a certain population from the base data (irrelevant to my final use).

I want to condense, for example in the above case, the 4 segments into 2 big segments, based on the 4 attributes and on the target variable. I am currently dealing with 15k segments and want only 10 segments with each of the final segment based on target and also having a sensible attribute distribution.

Now, pardon my if I am wrong but CHAID on SPSS (if not using autogrow) will generally split the data into 70:30 ratio where it builds the tree on 70% of the data and tests on the remaining 30%. I can't use this approach since I need all my segments in the data to be included. I essentially want to club these segments into a a few big segments as explained before. My question is whether I can use CART (rpart in R) for the same. There is an explicit option 'subset' in the rpart function in R but I am not sure whether not mentioning it will ensure CART utilizing 100% of my data. I am relatively new to R and hence a very basic question.

Bas
  • 1,066
  • 1
  • 10
  • 28
Jatin
  • 11
  • 1
  • You could improve this question and make it easier to understand what you're asking by providing a minimal, reproducible example, including your CART code. See http://stackoverflow.com/q/5963269 for good pointers! – BenBarnes Nov 09 '15 at 15:20

0 Answers0