I am trying to combine the data from 9 csvs that are all related to each other into one database.
The issue I'm having is the base csv has 5 million records and it needs information for the other 8 also large csvs to create one complete record, but creating this record takes over a minute.
Here is a simplified view of the problem.
The base CSV represents a Vehicle
Vehicle {
vehicle_id,
engine_id,
maintenance_id
veh_eng_maintenance_id,
}
Where maintenance_id is the primary key of a maintenance object, and there is also intermediate lookup steps.
Lookup
{
lookup_id,
veh_eng_maintenance_id,
schedule_id,
}
Where schedule_id is the primary key of a schedule object from another csv and the veh_eng_maintenance_id is from the vehicle.
My goal is to create a collection in my mongo database that is made up of Vehicles
Vehicle {
vehicle_id,
engine_id,
maintenance {
description,
name,
}
schedules [
schedule {
name,
description,
date,
}
]
}
Right now I'm loading the csvs in using c#, creating collections for them in mongo and classes for them in c#, then I am going through the vehicles collection (all 5 million records) and querying all of the other collections to create the completed vehicle record.
But this takes way too long, and it also takes too long to query for an individual vehicle on the fly instead of building the full vehicle collection before hand. I'm wondering if there is a fast way to combine extremely large collections, or a quicker way to query.