1

I have a dataset, df, that looks like this but has a few million instances:

Date AD Runway MTOW nr.flights 2008-01-01 A 18 376 2 2008-01-01 A 18 376 2 2008-01-01 D 36 190 1 2008-01-02 D 09 150 2 2008-01-02 A 36 280 1 2008-01-02 A 36 280 1

And I want it to look like this:

Date AD Runway MTOW nr.flights 2008-01-01 A 18 752 4 2008-01-01 D 36 190 2 2008-01-02 D 9 150 2 2008-01-02 A 36 560 1

Basically I want to group together all the Date, AD and Runway rows that are the same, so all the duplicates are removed. At the same time, I want the MTOW and nr.flights to be summed up for that particular Date, AD and Runway.

I've tried this: vals <- expand.grid(Date = unique(df$Date), Runway = unique(df$Runway), AD = unique(df$AD))

So I could merge this with the original dataset, df, but that didn't work. I have also tried a few combinations of group_by but that also didn't give me the result that I wanted.

To reproduce:

df <- data.frame(Date=c("2008-01-01","2008-01-01","2008-01-01","2008-01-02","2008-01-02","2008-01-02"),
              AD = c("A", "A", "D", "D", "A", "A"), Runway = c(18, 18, 36, 09, 36,36), 
              MTOW = c(376, 376, 190, 150, 280, 280), nr.flights = c(2,2,1,2,1,1))

Any help would be much appreciated!

Veraaa
  • 301
  • 2
  • 6
  • 15

2 Answers2

5

With library dplyr, you can do something like this:

df %>% group_by(Date, AD, Runway) %>% summarise(MTOW = sum(MTOW), nr.flights = sum(nr.flights))
Source: local data frame [4 x 5]
Groups: Date, AD [?]

        Date     AD Runway  MTOW nr.flights
      (fctr) (fctr)  (dbl) (dbl)      (dbl)
1 2008-01-01      A     18   752          4
2 2008-01-01      D     36   190          1
3 2008-01-02      A     36   560          2
4 2008-01-02      D      9   150          2

There are many such posts and examples already I think.

Gopala
  • 10,363
  • 7
  • 45
  • 77
  • Thank you for answering! This solution worked on my example data set, but somehow it didn't work on my original dataset.. I don't know why yet. But I have the solution now though, so thanks anyway! – Veraaa Apr 22 '16 at 12:55
  • 1
    Best way to thank on stackoverflow is to up vote or accept answers. :) – Gopala Apr 22 '16 at 13:08
1

Here is one that uses the package plyr:

library(plyr)
ddply(df,~Date + AD + Runway,summarise,MTOW=sum(MTOW),nr.flights=sum(nr.flights))
Ravi
  • 3,223
  • 7
  • 37
  • 49