1

I have a data.frame that has 9 columns and over 16k rows. I need to split the data.frame into 50 groups by one column that is in ascending order (Subj_Avg_Home). The groups should be around equal size.

I searched for an answer in the posts

My data frame looks something like this.

head(thesisdata)
 ID          Home_Score Away_Score Subj_Avg_Home 
48550            4            2       0.0181413635731533           
30965            4            1       0.016167700385985           
40501            5            1       0.0185671994247871           
41771            3            5       0.0186986545666144          
42138            3            4       0.01900475916696            
42975            4            7       0.0202611448135552
43724            1            1       0.0204169805144118
47592            4            3       0.020769733472299
47201            3            4       0.0207922542122643

If I would split this sample into three groups I would like to have three groups like below

 head(group 1)
 ID          Home_Score Away_Score Subj_Avg_Home 
48550            4            2       0.0181413635731533           
30965            4            1       0.016167700385985           
40501            5            1       0.0185671994247871 

head(group 2)         
41771            3            5       0.0186986545666144          
42138            3            4       0.01900475916696            
42975            4            7       0.0202611448135552

head(group 3)
43724            1            1       0.0204169805144118
47592            4            3       0.020769733472299
47201            3            4       0.0207922542122643

I am sorry if my formatting is not optimal!

user3168701
  • 29
  • 1
  • 3
  • 1
    As suggested [here](http://stackoverflow.com/questions/18139708/split-data-frame-into-rows-of-fixed-size), you could try `split(df, (0:nrow(df) %/% 3)) ` – Steven Beaupré Oct 25 '16 at 17:40
  • That doesn't group it by quantiles though – Allen Wang Oct 25 '16 at 17:52
  • 1
    I was just about to post this answer: quantile_generator <- function(input_vector, quantiles = 4) { tiles <- quantile(input_vector, probs = seq(0,1,1/quantiles)) temp <- rep(NA,length(input_vector)) for (i in 2:length(tiles)){ temp[input_vector <= tiles[i] & is.na(temp)] <- i } return(temp) } tempvec <- runif(100,0,1) df <- data.frame(v1= tempvec, v2= quantile_generator(tempvec, 4)) split(df, df$v2) – Allen Wang Oct 25 '16 at 17:53

0 Answers0