1

This is my data frame

ID=c(1,2,3,4,5,6,7,8,9,10,11,12)
favFruit=c('apple','lemon','pear',
       'apple','apple','pear',
       'apple','lemon','pear',
       'pear','pear','pear')
surveyDate = ('1/1/2005','1/1/2005','1/1/2005',
         '2/1/2005','2/1/2005','2/1/2005',
         '3/1/2005','3/1/2005','3/1/2005',
         '4/1/2005','4/1/2005','4/1/2005')

df<-data.frame(ID,favFruit, surveyDate)

I need to aggregate it so I can plot a line graph in R for count of favFruit by date split by favFruit but I am unable to create an aggregate table. My data has 45000 rows so a manual solution is not possible.

surveyYear   favFruit  count
1/1/2005       apple     1
1/1/2005       lemon     1
1/1/2005       pear      1
2/1/2005       apple     2
2/1/2005       lemon     0
2/1/2005       pear      1
... etc

I tried this but R printed an error

df2 <- aggregate(df, favFruit, FUN = sum)

and I tried this, another error

df2 <- aggregate(df, date ~ favFruit, sum)

I checked for solutions online but their data generally included a column of quantities which I dont have and the solutions were overly complex. Is there an easy way to do this? Thanx in advance. Thank you to whoever suggested the link as a possible duplicate but it has has date and number of rows. But my question needs number of rows by date and favFruit (one more column) 1

Update: Ronak Shah's solution worked. Thanx!

ithoughtso
  • 103
  • 8
  • 1
    `df2 <- aggregate(surveyDate ~ favFruit, df, length)`. See help page of `?aggregate` for examples and syntax. – Ronak Shah Apr 09 '21 at 04:37
  • Thanks, but the code lists favFruit and total count. How can it be broken down by surveyDate also? I am having trouble with that. I checked the recommended link above but it is not the same question because it is missing another factor. – ithoughtso Apr 09 '21 at 04:57
  • 1
    You can do `aggregate(ID~favFruit + surveyDate, df, length)`. To avoid confusion it is always better to include the expected output in your post. – Ronak Shah Apr 09 '21 at 05:01
  • It worked! thanks!! – ithoughtso Apr 09 '21 at 05:08
  • I am not sure how to designate it as the solution. I am sure others will have the same question. Do you know how? – ithoughtso Apr 09 '21 at 05:11

1 Answers1

1

The solution provided by Ronak is very good. In case you prefer to keep the zero counts in your dataframe. You could use table function:

data.frame(with(df, table(favFruit, surveyDate)))

Output:

   favFruit surveyDate Freq
1     apple   1/1/2005    1
2     lemon   1/1/2005    1
3      pear   1/1/2005    1
4     apple   2/1/2005    2
5     lemon   2/1/2005    0
6      pear   2/1/2005    1
7     apple   3/1/2005    1
8     lemon   3/1/2005    1
9      pear   3/1/2005    1
10    apple   4/1/2005    0
11    lemon   4/1/2005    0
12     pear   4/1/2005    3
TarJae
  • 72,363
  • 6
  • 19
  • 66