So I have a small data set which should be great for modeling (<1 million records), but one variable is giving me problems. It's a categorical variable with ~98 levels called [store] - this is the name of each store. I am trying to predict each stores sales [sales] which is a continuous numeric variable. So the vector size is over 10GB and crashes with memory errors in R. Is it possible to make 98 different regression equations, and run them one by one for every level of [store]?
My other idea would be to try and create 10 or 15 clusters of this [store] variable, then use the cluster names as my categorical variable in predicting the [sales] variable (continuous variable).