I am doing the Maven Analytics NYC Taxi Challenge and I loaded 4 different csv's which add up to about 28M
I noticed 2 dataframes had 19 columns instead of 18 so I removed them
taxi_data_2019$congestion_surcharge <- NULL
taxi_data_2020$congestion_surcharge <- NULL
Then I conducted a union_all
taxi_data_all <- union_all(taxi_data_2017, taxi_data_2018, taxi_data_2019, taxi_data_2020)
The results of the union_all is 8M less for some reason.
I am using tidyverse to do this