I've been struggling with this problem in R. Basically, I've taken a subset of cities with flights that had a delay due to a carrier, and I need to figure out the total number of flights those cities had (delay or no delay). I can get the total number of flights per city pretty easily with:
count(flights, ORIGIN_CITY_NAME)
but I can't match that up to my data frame since they won't have the same number of rows. How can I filter that list so it only includes cities found in carrierDelayed?
require("dplyr")
flights <- read.csv("airplaneData.csv", header = TRUE, sep = ",")
carrierDelayed <- subset(flights, flights$CARRIER_DELAY > 0)
carrierPercent <- data.frame(unique(carrierDelayed$ORIGIN_CITY_NAME), /* Total Count Should Go Here */)