-2

I'm very new to R. I hope that some one can help me with this one. I have a data.frame, which looks for example like this:

Year    month   d   class
2009    200901  1   a
2009    200901  1   b
2009    200902  2   a
2009    200902  1   b
2009    200902  1   c
2009    200903  5   a
2009    200903  1   b
2009    200903  1   c
2009    200903  3   a
2010    201001  1   a
2010    201001  4   b
2010    201002  1   a
2010    201002  7   b
2010    201002  1   c
2010    201003  2   a
2010    201003  4   b
2010    201003  2   c
2010    201003  1   a

I would like to make a cross table out of them and the result would look like this

Year       a        b      c
2009       3.667    1      0.667
2010       1.667    5      1

First I would like to summary all the data for each month per class and then taking the average over all those month to have the number per year for each class.

Thanks a lot.

Jaap
  • 81,064
  • 34
  • 182
  • 193
thd
  • 21
  • 1
  • 4
  • Look for `cast` from `reshape` package . – Pankaj Kaundal Sep 27 '16 at 13:05
  • 1
    so do you have a `data.frame` or a `data.table`? What do you mean by "summary all the data"? Also it is generally a good practise to show the code that you have tried and failed. In addition, make sure that all sample data you share is [reproducible](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – Sotos Sep 27 '16 at 13:13
  • what I have now is a data.frame. before i use "table" : table(df$year, df$class), but it would take the sum over all the month. or using "tapply" to take the mean or sum. – thd Sep 27 '16 at 13:27
  • how come can c have values 0.667, 1 in your in your result table? should not it be 1, 1.5? – Sandipan Dey Sep 27 '16 at 13:29
  • yes, you are right @sandipan. – thd Sep 27 '16 at 13:33

2 Answers2

1

A solution with tidyrand dplyr, where datis your dataframe.

library(tidyr)
library(dplyr)

dat %>% group_by(Year, month,class) %>% summarise(d=sum(d)) %>%
    spread(class, d) %>% group_by(Year) %>%
        summarise(a=mean(a,na.rm=T),b=mean(b,na.rm=T),c=mean(c,na.rm=T))

The output is:

# A tibble: 2 x 4
   Year        a     b     c
  <int>    <dbl> <dbl> <dbl>
1  2009 3.666667     1   1.0
2  2010 1.666667     5   1.5
Roland
  • 377
  • 4
  • 14
0

Try this (df is the original data frame):

df

   Year  month d class
1  2009 200901 1     a
2  2009 200901 1     b
3  2009 200902 2     a
4  2009 200902 1     b
5  2009 200902 1     c
6  2009 200903 5     a
7  2009 200903 1     b
8  2009 200903 1     c
9  2009 200903 3     a
10 2010 201001 1     a
11 2010 201001 4     b
12 2010 201002 1     a
13 2010 201002 7     b
14 2010 201002 1     c
15 2010 201003 2     a
16 2010 201003 4     b
17 2010 201003 2     c
18 2010 201003 1     a

library(reshape2)
df1 <- aggregate(d ~ month + class + Year, df, sum) 
df1 <- aggregate(d ~ class + Year, df1, mean)
dcast(df1, Year~class)

  Year        a b   c
1 2009 3.666667 1 1.0
2 2010 1.666667 5 1.5
Sandipan Dey
  • 21,482
  • 2
  • 51
  • 63
  • df1 <- aggregate(d ~ month + class + Year, df, sum), does not work, I get Error in model.frame.default – thd Sep 27 '16 at 13:35
  • is your df a data frame? it should work, are you using exactly the same code? it should work, it works for me. – Sandipan Dey Sep 27 '16 at 13:37