4

I have a data set that looks like below. In real data set, there are 8619 rows.

Athlete      Competing Country  Year    Total Medals
Michael Phelps    United States 2012    6
Alicia Coutts     Australia     2012    5
Missy Franklin    United States 2012    5
Brian Leetch      United States 2002    1
Mario Lemieux     Canada        2002    1
Ylva Lindberg     Sweden        2002    1
Eric Lindros      Canada        2002    1
Ulrica Lindström  Sweden        2002    1
Shelley Looney    United States 2002    1

and I want to rearrange this data by country, year and the sum of the medals.

I want result like

Country        Year  SumOfMedals
United States  2012  11
United States  2002   2
...

by(newmd$Total.Medals, newmd$Year, FUN=sum)
by(md$Total.Medals, md$Competing.Country, FUN=sum)

I tired to use by argument, but still stuck. Can any of you help me out?

tobby
  • 117
  • 1
  • 2
  • 9

2 Answers2

3

Or using data.table, we convert the 'data.frame' to 'data.table' (setDT(df1)), grouped by 'Competing_Country', 'Year', get the sum of 'Total_Medalsand thenorder` by the variables of interest.

library(data.table)
setDT(df1)[,list(SumOfMedals = sum(Total_Medals)), 
   by = .(Competing_Country, Year)
        ][order(-Competing_Country, -Year, -SumOfMedals)]

Or with dplyr, we use the same methodology.

library(dplyr)
df1 %>%
    group_by(Competing_Country, Year) %>%
    summary(SumOfMedals = sum(Total_Medals) %>%
    arrange(desc(Competing_Country), desc(Year), desc(SumOfMedals))

data

 df1 <- structure(list(Athlete = c("Michael Phelps", "Alicia Coutts", 
"Missy Franklin", "Brian Leetch", "Mario Lemieux", "Ylva Lindberg", 
"Eric Lindros", "Ulrica Lindström", "Shelley Looney"), Competing_Country = c("United States", 
"Australia", "United States", "United States", "Canada", "Sweden", 
"Canada", "Sweden", "United States"), Year = c(2012L, 2012L, 
2012L, 2002L, 2002L, 2002L, 2002L, 2002L, 2002L), Total_Medals = c(6L, 
5L, 5L, 1L, 1L, 1L, 1L, 1L, 1L)), .Names = c("Athlete", "Competing_Country", 
"Year", "Total_Medals"), class = "data.frame", row.names = c(NA, 
-9L))
akrun
  • 874,273
  • 37
  • 540
  • 662
2

You can do this pretty easily using aggregate to get the sum of the number of medals:

md2 <- aggregate(cbind(SumOfMedals = Total.Medals) ~ Competing.Country + Year),
          data = md,
          FUN = sum)

The next step is to sort md2 by Competing.Country and SumOfMedals, which is done using the order function:

md2 <- md2[order(Competing.Country, -SumOfMedals),] 

All done.

Andy Clifton
  • 4,926
  • 3
  • 35
  • 47
  • @J.Choi if this works, please consider marking this as the answer by clicking on the green arrow to the left of my post. Thanks! – Andy Clifton Nov 30 '15 at 16:24