1

I've been struggling with this problem in R. Basically, I've taken a subset of cities with flights that had a delay due to a carrier, and I need to figure out the total number of flights those cities had (delay or no delay). I can get the total number of flights per city pretty easily with:

count(flights, ORIGIN_CITY_NAME)

but I can't match that up to my data frame since they won't have the same number of rows. How can I filter that list so it only includes cities found in carrierDelayed?

require("dplyr")

flights <- read.csv("airplaneData.csv", header = TRUE, sep = ",")

carrierDelayed <- subset(flights, flights$CARRIER_DELAY > 0)

carrierPercent <- data.frame(unique(carrierDelayed$ORIGIN_CITY_NAME), /* Total Count Should Go Here */)
vestland
  • 55,229
  • 37
  • 187
  • 305
D. Oakley
  • 11
  • 3
  • 5
    Welcome to SO! Please read how to provide [minimal examples](http://stackoverflow.com/help/mcve) and [reproducible question](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example), then come back and edit your question appropriately. – r2evans Jun 21 '16 at 05:15
  • 1
    did you try `c1 %in% c2` or `?intersect` – Bulat Jun 21 '16 at 06:16
  • Try: `merge() %>% filter() %>% group_by() %>% summarise(n())`. – zx8754 Jun 21 '16 at 06:21
  • `table(carrierDelayed$ORIGIN_CITY_NAME %in% flights$ORIGIN_CITY_NAME)` sets the frequency to the total number of rows in carrierDelayed for all values, I'm guessing these aren't the correct arguments – D. Oakley Jun 21 '16 at 06:43

0 Answers0