I'm attempting to work with the NYC 2013 Taxi trip data set in MongoDB. It has about 170 million records in several CSV files, which I imported using mongoimport
. The strings and numbers import as the correct type, but the pickup and drop-off time stamps are still strings. I know the usual way to fix this:
But that results in each of the 170M records being fetched from the database, and then the replacement date is sent back. At the current rate, it looks like this will take at least 2 days to convert both fields in all the records. The database is being housed on 4 shards, and those machines are barely doing anything during this process. Is there a faster way to do the conversion that uses more of the database resources?