12

Possible Duplicate:
R Grouping functions: sapply vs. lapply vs. apply. vs. tapply vs. by vs. aggregate vs.

I'm using R and would love some help with a problem I'm having:

I have a dataframe (df) with a column ID and a column Emotion. Each value in ID corresponds with 40-300 values in Emotion (so it's not a set number). I need to calculate the mean of all i's in Emotion for each j in ID. So this is what the data looks like

df$ID = (1, 1, 1, 1, 2, 2, 3)
df$Emotion = (2, 4, 6, 4, 1, 1, 8)

so the vector of means should look like this: (4, 1, 8)

Any help would be greatly appreciated!

Community
  • 1
  • 1
Paul Meinshausen
  • 779
  • 1
  • 8
  • 13
  • 12
    On the contrary, I searched for a long, long time (though I'm new to searching so perhaps I didn't phrase my search terms appropriately). I wasn't able to find anything remotely as clear and direct as the answers provided, so I'm glad I asked the question. – Paul Meinshausen Nov 17 '12 at 02:11
  • 4
    And your possible duplicate suggestion is buried under a lot of jargon I'm not familiar with yet. But I'm learning! – Paul Meinshausen Nov 17 '12 at 02:11
  • To find the suggested duplicate I searched the R tag for the word "grouping". To search a tag it's "[r] grouping". That returned 288 questions. I sorted by votes and picked the most popular with 108 votes. Further, with the hindsight that you needed the tapply function, "[r] tapply" returns 426 questions. By looking at those questions you can pick up the jargon to improve your searches next time. – Matt Dowle Nov 17 '12 at 22:11

2 Answers2

23

You can use aggregate

ID = c(1, 1, 1, 1, 2, 2, 3)
Emotion = c(2, 4, 6, 4, 1, 1, 8)
df <- data.frame(ID, Emotion)


aggregate(.~ID, data=df, mean)
   ID Emotion
1  1       4
2  2       1
3  3       8

sapply could also be useful (this other solution will give you a vector)

sapply(split(df$Emotion, df$ID), mean) 
1 2 3 
4 1 8 

There are a lot of ways to do it including ddply from plyr package, data.table package, other combinations of split and lapply, dcast from reshape2 package. See this question for further solutions.

Community
  • 1
  • 1
Jilber Urbina
  • 58,147
  • 10
  • 114
  • 138
  • 1
    You too. How many times have we seen this before? S.O. is not a helpdesk. By answering, and not voting to close as a duplicate, you're degrading S.O. into a helpdesk. – Matt Dowle Nov 17 '12 at 01:27
  • 8
    What I often do is close to vote and provide a quick answer. That way there is some value in the question if someone stumbles upon it. In addition, wether or not someone answers this question will probably not influence the probability that it will occur again. The SO system can respond with a clean up if the question is deemed duplicate. I agree with your message, but object to downvoting people who make a different choice than you. – Paul Hiemstra Nov 18 '12 at 10:46
  • @PaulHiemstra Where's the value in duplicating information in the duplicate link? Whether someone answers _this_ question may not influence whether _this_ question will be asked again, as much as it will give the impression that it is ok to ask (and indeed answer) duplicate questions in general. As demonstrated by DWin already; he said as much in his comment. On choices, that would be fair if S.O. was a democracy. But it isn't. Their rules/guidance, not mine: [do your homework](http://stackoverflow.com/questions/how-to-ask). – Matt Dowle Nov 19 '12 at 11:12
  • @PaulHiemstra It may have seemed a bit unfair to pick on DWin, too, since it seems to be a problem in general. But, with a rep of 48.1k, the 2nd highest rep in the R tag, I think he can take one little downvote. Perhaps unfair to include Jilber too, but then it seemed unfair to downvote DWin in the context of this question, but not Jilber. – Matt Dowle Nov 19 '12 at 11:53
  • DWin can take some picking on ;), it is the tone in your response that rubbed me the wrong way. – Paul Hiemstra Nov 19 '12 at 13:10
  • @PaulHiemstra Yes I suppose I reached a tipping point. It had been building up. – Matt Dowle Nov 20 '12 at 18:04
  • @Jilber Apologies for misuse of downvote. Now reversed. – Matt Dowle Nov 27 '12 at 01:10
10

This is precisely the job tapply was designed to do.

tapply(df$ID , df$Emotion, mean) 
Matt Dowle
  • 58,872
  • 22
  • 166
  • 224
IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • -1 from me DWin. This is one of the simplest questions imaginable. By not voting to close as a duplicate you appear to be turning S.O. into a helpdesk. – Matt Dowle Nov 17 '12 at 01:23
  • 1
    I think the admins are giving us that message. There are penalties for referring things to the moderators and the homework tag was removed. I'll join you in voting to close, but there are quite a few other questions on SO that are duplicates and the interface seems to not be designed to encourage the work needed to flag them. The opposite message is the one I am getting. – IRTFM Nov 17 '12 at 01:59
  • Do you have any links to the opposite message please? – Matt Dowle Nov 17 '12 at 21:56
  • 4
    @MatthewDowle Downvoting is not to be used for this purpose! Whilst I agree that encouraging duplicates is not to be encouraged, there is a strong argument, voiced on Meta and on the SE blog, that duplicates aren't a problem and Answering them is OK. The reason is that one may have a problem that the cognoscenti know is a duplicate but are not capable of putting the problem into the same words used in the already-asked question on [so]. Of course a balance has to be found... – Gavin Simpson Nov 26 '12 at 19:53
  • @GavinSimpson Ok, do you have some links please? I did look myself before I asked DWin for links but I couldn't find any. That's why I asked for some references. I agree in general, but this is a question about very basic _grouping_. I very very rarely downvote, but made an exception on this one. – Matt Dowle Nov 26 '12 at 23:08
  • 1
    @MatthewDowle You shouldn't need a link for this. Ignore the duplicate issue; would you down vote this answer because it is intrinsically bad or flawed or wrong? No. So you shouldn't Donwvote. Your comments should be enough chastisement. – Gavin Simpson Nov 26 '12 at 23:18
  • @GavinSimpson I was referring to links about downvoting, not duplicates, but realise now that was unclear. Here's a link that appears to encourage downvoting : http://meta.stackexchange.com/a/9961/171363. "The up/down vote system is not just about rep, it is the quality control mechanism for Stack Overflow.". "It is not designed to be a personal attack against the users in question.". I can't find anything close to saying that downvoting isn't to be used for the purpose I used it. – Matt Dowle Nov 26 '12 at 23:29
  • @MatthewDowle My point was (and I read that link as concurring with me) that the voting should reflect the quality of the Answer not whether you think DWin should or should not have Answered. That link was about a Question where "Does not show research effort" is valid reason to Downvote. [See this Meta posting](http://meta.stackexchange.com/questions/2451/why-do-you-cast-downvotes-on-answers) for a wide-ranging coverage of what people consider as reasons for downvoting Answers. – Gavin Simpson Nov 27 '12 at 00:08
  • @GavinSimpson I don't see that in the link I gave, but I see what you mean from yours. Thanks for correcting me. At least I included a comment with the downvote, explaining the reason for it, and both answerers did subsequently join me in closing as duplicate. So I've undownvoted both answers now. And next time, I'll just comment, as you suggest. – Matt Dowle Nov 27 '12 at 01:08
  • @DWin Apologies for misuse of downvote. Now reversed. – Matt Dowle Nov 27 '12 at 01:10
  • @MatthewDowle: No need to worry. As you said a downvote isn't going to bother either of us. And your massive contributions to R are greatly appreciated. – IRTFM Nov 27 '12 at 07:29