1

I have a data frame looking like this:

Grade   Class_Dept   Class_Name   Class_Work
9       English      English 1    30
10      History      Modern World 50
11      Science      AP Chem      85
12      Math         Calc BC      45

It extends further than that, but that's the general idea. I would like to split this into multiple smaller data frames by Class_Name. I tried using plyr, but couldn't figure it out. I also tried the split() function, which worked, but did not allow me to index into each sub-dataframe in a for loop. Is there any other way I can do this? Any help would be appreciated.

Also, the split() function would work if I could index into each sub-dataframe. If that doesn't make sense, what I would want to do is get the mean and standard deviation of the Class_Work for each Class_Name and compare them. I could do this manually with the list returned from split(), but it would take a long time, as my dataframe has about 120 different classes. If there's a way to automate this, that would be fantastic.

Kai036
  • 190
  • 1
  • 2
  • 10
  • 1
    You can index into each sub-dataframe, `ls2 <- split(mtcars,mtcars$cyl); mean(ls2[[1]]$mpg)` , what is the error in the for loop? – Mike Nov 21 '19 at 19:57
  • Thank you so much! I wasn't doing `ls2[[1]]`, I was doing `ls2[1]` instead. Could you explain why the extra set of brackets fixes it? – Kai036 Nov 21 '19 at 20:02
  • 2
    `\[` vs `\[[`: https://stackoverflow.com/q/1169456/5325862 – camille Nov 21 '19 at 20:19
  • 1
    Does this answer your question? [Split a large dataframe into a list of data frames based on common value in column](https://stackoverflow.com/questions/18527051/split-a-large-dataframe-into-a-list-of-data-frames-based-on-common-value-in-colu) – M-- Nov 21 '19 at 20:35

4 Answers4

3

You can use dplyr::group_split()

library(dplyr)
iris %>%
    group_by(Species) %>%
    group_split()

yusuzech
  • 5,896
  • 1
  • 18
  • 33
0

If you're trying to split and loop, try split and lapply/vapply:

vapply(split(mtcars, mtcars$cyl), function(df) mean(df$mpg), double(1))
SmokeyShakers
  • 3,372
  • 1
  • 7
  • 18
0

It seems like the real goal is to collect summary data on your total dataset grouped by "Class_Name" and that it is really unnecessary to split into different data frames. There are several good options to perform this summary with both base R and with the dplyr package.

Below are example using the split/sapply, tapply and the group_by/summarize techniques.

df<-read.table(header=TRUE, text='Grade   Class_Dept   Class_Name   Class_Work
9       English      "English 1"    30
10      History      "Modern World" 50
11      Science      "AP Chem"      85
12      Math         "Calc BC"      45')

#Base R solution
#split into a list of dataframes by Class_name
dflist<-split(df, df$Class_Name)
#perform math operation on each dataframe
workmean<-sapply(dflist, function(x){ mean(x$Class_Work)})
workstdev<-sapply(dflist, function(x){ sd(x$Class_Work)})

workmean
#   AP Chem      Calc BC    English 1 Modern World 
#        85           45           30           50 

#tapply option:
tapply(df$Class_Work, df$Class_Name, mean)
#     AP Chem      Calc BC    English 1 Modern World 
#          85           45           30           50 

#dplyr solution
library(dplyr)
df %>% group_by(Class_Name) %>% summarize(mean=mean(Class_Work), stdev=sd(Class_Work))
# # A tibble: 4 x 3
#   Class_Name    mean stdev
#   <fct>        <dbl> <dbl>
# 1 AP Chem         85   NaN
# 2 Calc BC         45   NaN
# 3 English 1       30   NaN
# 4 Modern World    50   NaN
Dave2e
  • 22,192
  • 18
  • 42
  • 50
0

You can you data.table package:

> dt <- iris
> setDT(dt)[,.(mean=mean(Petal.Width),std_dv=sd(Sepal.Length)),by=.(Species)]

     Species  mean    std_dv
1:     setosa 0.246 0.3524897
2: versicolor 1.326 0.5161711
3:  virginica 2.026 0.6358796
Rushabh Patel
  • 2,672
  • 13
  • 34