I am trying group some data in a dataframe and perform some calculations on the results via a loop.
Take the following dataframe- "age_wght"
Year Last_Name First_Name Age Weight
1 2000 Smith John 20 145
2 2000 Smith Matt 9 85
3 2005 Smith John 25 160
4 2000 Jones Bob 12 100
5 2000 Jones Mary 18 120
6 2005 Jones Mary 23 130
7 2000 Jones Carrie 9 90
8 2005 Jones Bob 17 210
I am trying to get average ages and weights for each person.
I can do this via tapply: Currently I am calculate this by creating a new key column in the dataframe via:
age_wght$key1 = paste(age_wght$Last_Name, age_wght$First_Name, sep = ".")
Year Last_Name First_Name Age Weight key1
1 2000 Smith John 20 145 Smith.John
2 2000 Smith Matt 9 85 Smith.Matt
3 2005 Smith John 25 160 Smith.John
4 2000 Jones Bob 12 100 Jones.Bob
5 2000 Jones Mary 18 120 Jones.Mary
6 2005 Jones Mary 23 130 Jones.Mary
Then using tapply as below:
avg_age <- with(age_wght, tapply(Age, key1, FUN = mean))
avg_wght <-with(age_wght, tapply(Weight, key1, FUN = mean))
age_wght_summary <- data.frame(avg_age, avg_wght)
age_wght_summary
But what I get then is something that looks like this:
avg_age avg_wght
Jones.Bob 14.5 155.0
Jones.Carrie 9.0 90.0
Jones.Mary 20.5 125.0
Smith.John 22.5 152.5
Smith.Matt 9.0 85.0
Which makes sense as I am placing the tapply on the key1 index, but my desired outcome is 9 to have a table with the headers:
Last_Name First_Name avg_age avg_wght
I also tried the dplyr library using group_by but was not able to get it to work.