I have a 15GB file with more than 25 milion rows, which is in this json format(which is accepted by mongodb for importing:
[
{"_id": 1, "value": "\u041c\..."}
{"_id": 2, "value": "\u041d\..."}
...
]
When I'm trying to import it in mongodb with the following command I get speed of only 50 rows per second which is really slow for me.
mongoimport --db wordbase --collection sentences --type json --file C:\Users\Aleksandar\PycharmProjects\NLPSeminarska\my_file.json -jsonArray
When I tried to insert the data into the collection by using python with pymongo the speed was even worse. I also tried increasing the priority of the process but it didn't make any difference.
The next thing that I tried is the same thing but without using -jsonArray
and although I got a big speed increase(~4000/sec), it said that the BSON representation of the supplied JSON is too large.
I also tried splitting the file into 5 separate files and importing them from separate consoles into the same collection, but I get speed decrease of all of them to about 20 documents/sec.
While I searched all over the web I saw that people had speeds of over 8K documents/sec and I can't see what do I do wrong.
Is there a way to speed this thing up, or should I convert the whole json file to bson and import it that way, and if so which is the correct way to do both the converting and the importing?
Huge thanks.