1

I am trying group some data in a dataframe and perform some calculations on the results via a loop.

Take the following dataframe- "age_wght"

  Year Last_Name First_Name Age Weight
1 2000     Smith       John  20    145
2 2000     Smith       Matt   9     85
3 2005     Smith       John  25    160
4 2000     Jones        Bob  12    100
5 2000     Jones       Mary  18    120
6 2005     Jones       Mary  23    130
7 2000     Jones     Carrie   9     90
8 2005     Jones        Bob  17    210

I am trying to get average ages and weights for each person.

I can do this via tapply: Currently I am calculate this by creating a new key column in the dataframe via:

age_wght$key1 = paste(age_wght$Last_Name, age_wght$First_Name, sep = ".")

  Year Last_Name First_Name Age Weight       key1
1 2000     Smith       John  20    145 Smith.John
2 2000     Smith       Matt   9     85 Smith.Matt
3 2005     Smith       John  25    160 Smith.John
4 2000     Jones        Bob  12    100  Jones.Bob
5 2000     Jones       Mary  18    120 Jones.Mary
6 2005     Jones       Mary  23    130 Jones.Mary

Then using tapply as below:

avg_age <- with(age_wght, tapply(Age, key1, FUN = mean))

avg_wght <-with(age_wght, tapply(Weight, key1, FUN = mean))

age_wght_summary <- data.frame(avg_age, avg_wght)

age_wght_summary

But what I get then is something that looks like this:

             avg_age avg_wght
Jones.Bob       14.5    155.0
Jones.Carrie     9.0     90.0
Jones.Mary      20.5    125.0
Smith.John      22.5    152.5
Smith.Matt       9.0     85.0

Which makes sense as I am placing the tapply on the key1 index, but my desired outcome is 9 to have a table with the headers: Last_Name First_Name avg_age avg_wght

I also tried the dplyr library using group_by but was not able to get it to work.

boydok
  • 17
  • 3

2 Answers2

1

A dplyr solution

library(dplyr)

age_wght %>%
    group_by(Last_Name, First_Name) %>%
    summarise(avg_age = mean(Age),
                        avg_wght = mean(Weight))

#   Last_Name First_Name avg_age avg_wght
#     (fctr)     (fctr)   (dbl)    (dbl)
# 1     Jones        Bob    14.5    155.0
# 2     Jones     Carrie     9.0     90.0
# 3     Jones       Mary    20.5    125.0
# 4     Smith       John    22.5    152.5
# 5     Smith       Matt     9.0     85.0

A data.table solution

library(data.table)
setDT(age_wght)[, .(avg_age = mean(Age), avg_wght = mean(Weight)), by=.(Last_Name, First_Name)]

#    Last_Name First_Name avg_age avg_wght
# 1:     Smith       John    22.5    152.5
# 2:     Smith       Matt     9.0     85.0
# 3:     Jones        Bob    14.5    155.0
# 4:     Jones       Mary    20.5    125.0
# 5:     Jones     Carrie     9.0     90.0
SymbolixAU
  • 25,502
  • 4
  • 67
  • 139
0

A base R solution:

nms <- strsplit(rownames(age_wght_summary), split= "\\.")
data.frame(last_name= lapply(nms, "[", 1),
           first_name=lapply(nms, "[", 2),
           avg_age= age_wht_summary$avg_age,
           avg_age= age_wht_summary$avg_wght)
alexwhitworth
  • 4,839
  • 5
  • 32
  • 59