0

I really hope this is not a duplicate but I cant find an answer that apply to my case.

I have panel data and I need to calculate the mean score by year to of my data. my data looks something like this

df <- data.frame(
"Country" = c("USA", "EU", "Africa","USA", "EU", "Africa","USA", "EU", "Africa"),
"Year" = c(1970, 1970, 1970, 1980, 1980, 1980,1990, 1990, 1990), 
"Score" = runif(9, min=20, max=100),
"Other" = rnorm(9),
stringsAsFactors = FALSE)

My goal is to calculate the mean "Score" for every year. In other words the hence the different means of all the countries for 1970, for 1980 and 1990.

I have tried to run the operation grouping it,

mean<- df %>%
  group_by(Year) %>%
  summarise(mean(na.omit(df$Score)))

But this codes gives me the mean of the scores, not the mean for each year

I have also tried to use ddply, but for some reason, it does not seem to work

mean2 <- ddply(.data = df, variables = .(Year), (mean(df$Score)))

would anyone know how to help me with an intuitive code that I could use for a large database?

thanks a lot

Alex
  • 1,207
  • 9
  • 25

1 Answers1

2

The issue is that you used df$Score rather than just Score; that killed the grouping effect. Instead we want

df %>% group_by(Year) %>% 
  summarise(meanScore = mean(Score, na.rm = TRUE))
# A tibble: 3 x 2
#    Year meanScore
#   <dbl>     <dbl>
# 1  1970      80.0
# 2  1980      69.9
# 3  1990      52.9

The same for ddply:

ddply(df, .(Year), summarise, meanScore = mean(Score, na.rm = TRUE))
#   Year meanScore
# 1 1970  80.02505
# 2 1980  69.92299
# 3 1990  52.87667

Of course you may also find it also with base R, as in

tapply(df$Score, df$Year, mean, na.rm = TRUE)
#     1970     1980     1990 
# 80.02505 69.92299 52.87667 
Julius Vainora
  • 47,421
  • 9
  • 90
  • 102