I need to merge 5 collections in a MongoDB on a couple of field names & return it as a CSV file. I can read the collections into pandas using the from_records
method no problem & merge a subset of these using pd.merge
but the issue is each data frame I want to merge has 20,000+ columns & 100,000+ rows. The merging process is obviously extremely slow due to size.
I've never dealt with data on this order of magnitude -- what are some ways I can speed up this process? Maybe pandas isn't the right tool to use at this point?