2

So I made a data frame of people's names, ages, and their favorite movies. I want to write a program that acts on the data frame to give me the average age of each person with a specific favorite movie. Here's what I have.

 persons <- list(firstName = c("Steve","Bob","Bill","Chris","Matt","Evan"), lastName = c("Williams","Barker","Barker","Williams","Stevenson","Parker"), age = c(22,30,41,14,9,93), favoriteMovie = c("Alien","The Shining","The Shining","Halloween","Alien","Alien"))
 d1 <- data.frame(persons$firstName,persons$lastName,persons$age,persons$favoriteMovie)

 d1
  persons.firstName persons.lastName persons.age persons.favoriteMovie
1             Steve         Williams          22                 Alien
2               Bob           Barker          30           The Shining
3              Bill           Barker          41           The Shining
4             Chris         Williams          14             Halloween
5              Matt        Stevenson           9                 Alien
6              Evan           Parker          93                 Alien

So I can do it with a loop of if statements but I don't think this is the most efficient way to do this. I'm sure there's some sort of way to kind of single out values but I'm really not sure.

Paul
  • 289
  • 2
  • 10
  • `mean(d1[ d1$persons.favoriteMovie == "Alien", "persons.age"])` – IRTFM Jun 16 '16 at 22:55
  • Also `tapply( d1$persons.age, d1$persons.favoriteMovie, mean)` .One or mere of the methods available should have been illustrated in the introductory material you should be studying. The canonical material is in "Introduction to R" shipped with every copy sent from CRAN. This is also most surely a duplicate SO question. – IRTFM Jun 16 '16 at 22:56
  • I recommend you take a look to the [Quick-R tutorial](http://statmethods.net/). It has good eplanations about various ways to work with data. Specifically, check the `Basic statistics` section – Barranka Jun 16 '16 at 22:56

4 Answers4

3

You could try using tapply

> with(d1, tapply(persons.age, persons.favoriteMovie, mean))
      Alien   Halloween The Shining 
   41.33333    14.00000    35.50000 

You migth want to take a look at this answer

Community
  • 1
  • 1
Jilber Urbina
  • 58,147
  • 10
  • 114
  • 138
2

You can use by() for this:

by(d1$persons.age, d1$persons.favoriteMovie, mean)
d1$persons.favoriteMovie: Alien
[1] 41.33333
------------------------------------------------------------------------------------------------------------- 
d1$persons.favoriteMovie: Halloween
[1] 14
------------------------------------------------------------------------------------------------------------- 
d1$persons.favoriteMovie: The Shining
[1] 35.5
HubertL
  • 19,246
  • 3
  • 32
  • 51
  • R has all these neat little built in functions I'm constantly finding out about. So simple! Thanks! – Paul Jun 16 '16 at 22:57
1

The package doBy with the function summaryBy can help you.

library(doBy)
summaryBy(persons.age~persons.favoriteMovie, data=d1, FUN=c(mean))
#persons.favoriteMovie persons.age.mean
#1                 Alien         41.33333
#2             Halloween         14.00000
#3           The Shining         35.50000

Or you could use dplyr.

library(dplyr)
grouped <- group_by(d1, persons.favoriteMovie)
summarise(grouped, mean=mean(persons.age))
#  persons.favoriteMovie     mean
#                 (fctr)    (dbl)
#1                 Alien 41.33333
#2             Halloween 14.00000
#3           The Shining 35.50000
milan
  • 4,782
  • 2
  • 21
  • 39
1

We can use data.table

library(data.table)
setDT(d1)[,.(persons.age = mean(persons.age)) , persons.favoriteMovie]
#   persons.favoriteMovie persons.age
#1:                 Alien    41.33333
#2:           The Shining    35.50000
#3:             Halloween    14.00000
akrun
  • 874,273
  • 37
  • 540
  • 662