0

I am doing the Maven Analytics NYC Taxi Challenge and I loaded 4 different csv's which add up to about 28M

I noticed 2 dataframes had 19 columns instead of 18 so I removed them

taxi_data_2019$congestion_surcharge <- NULL
taxi_data_2020$congestion_surcharge <- NULL

Then I conducted a union_all

taxi_data_all <- union_all(taxi_data_2017, taxi_data_2018, taxi_data_2019, taxi_data_2020)

The results of the union_all is 8M less for some reason.

I am using tidyverse to do this

camille
  • 16,432
  • 18
  • 38
  • 60
  • You could improve your chances of finding help here by adding a [minimal reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610). Adding a MRE and an example of the desired output (in code form, not tables and pictures) makes it much easier for others to find and test an answer to your question. That way you can help others to help you! P.S. Here is [a good overview on how to ask a good question](https://stackoverflow.com/help/how-to-ask) – dario Oct 18 '21 at 13:07
  • Check the number of rows per dataset. – zx8754 Oct 18 '21 at 13:16

1 Answers1

0

I think the union_all does not work with more than 2 tables. Maybe try bind_rows() that is really efficient with the same list of tables you did in your code

BPeif
  • 191
  • 6
  • Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Oct 18 '21 at 20:05
  • I will try that thank you – Abe Diaz Oct 19 '21 at 21:02