I have a data set where in 1 column there are 142 unique values. As part of building a predictive model, I want to create dummy variables for that column. But instead of creating 142 dummy variables, I first want to club the values which behaves similarly with respect to the response variable. The code which I used looks like below
round(tapply(train_data$Price,train_data$Suburb,mean),0)
This gives me 142 different elements in the array, which is time consuming if I manually go through to find the similar values. A snippet of my outpout is pasted below:
round(tapply(train_data$Price,train_data$Suburb,mean),0)
Abbotsford Aberfeldie Airport West
1057934 1235150 707542
Albert Park Albion Alphington
1919014 547711 1188880
Altona Altona North Armadale
757866 728127 1542430
Ascot Vale Ashburton Ashwood
968702 1595275 1049184
Avondale Heights Balaclava Balwyn
792321 675133 1912896
Balwyn North Bellfield Bentleigh
1769984 798778 1282869
Bentleigh East Box Hill Braybrook
1038886 1138650 646845
Brighton Brighton East Brooklyn
1864928 1607299 542182
Brunswick Brunswick East Brunswick West
952350 874927 744986
Bulleen Burnley Burwood
1142944 1150902 1167023
Camberwell Campbellfield Canterbury
1761263 447600 2284188
Carlton Carlton North Carnegie
1062721 1436615 915587
Caulfield Caulfield East Caulfield North
981417 1099000 1055575
Caulfield South Chadstone Clifton Hill
1119571 1007909 1049742
Coburg Coburg North Collingwood
851215 770902 858415
Cremorne Docklands Doncaster
943731 937500 1210059
Eaglemont East Melbourne Elsternwick
How can I write a code which groups all the values based on condition like the mean of which falls between 600000-699999, 700000-799999 and so on?