I have a dataset, a sample of which can be represented by the below:
ORIGIN <- c("SFO", "BOS", "LAX", "BOS")
DEST <- c("BOS", "SFO", "BOS", "LAX")
PASS<- c(100,50,20,40)
df = data.frame(ORIGIN, DEST, ROUTE, PASS)
ORIGIN DEST PASS
SFO BOS 100
BOS SFO 50
LAX BOS 20
BOS LAX 40
I'm trying to combine ORIGIN and DEST together to form ROUTE, and then aggregate the rows by route, without regard to the origin or destination, to get total passenger numbers.
Ideally I would like the below:
ROUTE PASS
SFO-BOS 150
LAX-BOS 60
I don't care about which route naming metric prevails, as long as I can combine them.
As a first step, I've used this:
df$ROUTE<-paste(df$ORIGIN, df$DEST, sep="-")
To get the below:
ORIGIN DEST ROUTE PASS
SFO BOS SFO-BOS 100
BOS SFO BOS-SFO 50
LAX BOS LAX-BOS 20
BOS LAX BOS-LAX 40
However while I would usually use something like ddply to get the passenger sum, I'm unsure of how to proceed since I'm trying to combine "BOS-SFO" and "SFO-BOS".
This would not be an issue in a small file, but the file I'm working with is excessively large.
Would someone be able to help?