Operating on R Data Frames

Question

So I made a data frame of people's names, ages, and their favorite movies. I want to write a program that acts on the data frame to give me the average age of each person with a specific favorite movie. Here's what I have.

 persons <- list(firstName = c("Steve","Bob","Bill","Chris","Matt","Evan"), lastName = c("Williams","Barker","Barker","Williams","Stevenson","Parker"), age = c(22,30,41,14,9,93), favoriteMovie = c("Alien","The Shining","The Shining","Halloween","Alien","Alien"))
 d1 <- data.frame(persons$firstName,persons$lastName,persons$age,persons$favoriteMovie)

 d1
  persons.firstName persons.lastName persons.age persons.favoriteMovie
1             Steve         Williams          22                 Alien
2               Bob           Barker          30           The Shining
3              Bill           Barker          41           The Shining
4             Chris         Williams          14             Halloween
5              Matt        Stevenson           9                 Alien
6              Evan           Parker          93                 Alien

So I can do it with a loop of if statements but I don't think this is the most efficient way to do this. I'm sure there's some sort of way to kind of single out values but I'm really not sure.

`mean(d1[ d1$persons.favoriteMovie == "Alien", "persons.age"])` — IRTFM, Jun 16 '16 at 22:55
Also `tapply( d1$persons.age, d1$persons.favoriteMovie, mean)` .One or mere of the methods available should have been illustrated in the introductory material you should be studying. The canonical material is in "Introduction to R" shipped with every copy sent from CRAN. This is also most surely a duplicate SO question. — IRTFM, Jun 16 '16 at 22:56
I recommend you take a look to the [Quick-R tutorial](http://statmethods.net/). It has good eplanations about various ways to work with data. Specifically, check the `Basic statistics` section — Barranka, Jun 16 '16 at 22:56

score 3 · Answer 1 · edited May 23 '17 at 12:31

3

You could try using tapply

> with(d1, tapply(persons.age, persons.favoriteMovie, mean))
      Alien   Halloween The Shining 
   41.33333    14.00000    35.50000

You migth want to take a look at this answer

edited May 23 '17 at 12:31

Community

1
1

answered Jun 16 '16 at 22:57

Jilber Urbina

58,147
10
114
138

Thanks for that link, gonna read through it now. Looks like a lot of good stuff. – Paul Jun 16 '16 at 22:59

score 2 · Answer 2 · answered Jun 16 '16 at 22:55

2

You can use by() for this:

by(d1$persons.age, d1$persons.favoriteMovie, mean)
d1$persons.favoriteMovie: Alien
[1] 41.33333
------------------------------------------------------------------------------------------------------------- 
d1$persons.favoriteMovie: Halloween
[1] 14
------------------------------------------------------------------------------------------------------------- 
d1$persons.favoriteMovie: The Shining
[1] 35.5

answered Jun 16 '16 at 22:55

HubertL

19,246
3
32
51

R has all these neat little built in functions I'm constantly finding out about. So simple! Thanks! – Paul Jun 16 '16 at 22:57

milan · Answer 3 · 2016-06-17T01:54:30.443

The package doBy with the function summaryBy can help you.

library(doBy)
summaryBy(persons.age~persons.favoriteMovie, data=d1, FUN=c(mean))
#persons.favoriteMovie persons.age.mean
#1                 Alien         41.33333
#2             Halloween         14.00000
#3           The Shining         35.50000

Or you could use dplyr.

library(dplyr)
grouped <- group_by(d1, persons.favoriteMovie)
summarise(grouped, mean=mean(persons.age))
#  persons.favoriteMovie     mean
#                 (fctr)    (dbl)
#1                 Alien 41.33333
#2             Halloween 14.00000
#3           The Shining 35.50000

score 1 · Answer 4 · answered Jun 17 '16 at 02:46

We can use data.table

library(data.table)
setDT(d1)[,.(persons.age = mean(persons.age)) , persons.favoriteMovie]
#   persons.favoriteMovie persons.age
#1:                 Alien    41.33333
#2:           The Shining    35.50000
#3:             Halloween    14.00000

Operating on R Data Frames

4 Answers4