I have an homework where I have to find the best classification model possible for a dataset. My training set consists of 733 observations with 90000 variables each.
My problem is the following : whenever I try to perform an operation on the dataset (mice, rpart, ...), I get an error "cannot allocate vector of size x Gb" with x being really huge like 30-60 Gb.
My question is : how can I deal with such huge dataset ?
Since there are not many observations but lots of feature variables, I believe that a solution can consist of deriving new feature variables from the existing one in order to reduce the number of variables but I don't know if it's possible in R and if it would be statistically correct.
I did some researches on Internet but I found nothing that helps me. I would be very grateful if someone can help me. It may be useful to precise that I have very little knowledge about R and statistics in general.
Thanks in advance for your response !