I am seeking to import a large collection of nested JSON objects into a MongoDB database. It is common practice, under certain circumstances, to represent these relationships using referenced collections rather than directly embedded documents.
Here is a concrete example. Suppose I had tens of gigabytes worth of JSON in the following format, where children
is occasionally thousands of objects long, and each object has dozens of keys.
{
"a" : 1,
"b" : 2,
"children" : [
{
"x": "some long, complicated thing",
"y": [5, 6],
"huge_image": "..."
},
{
"x": "some other complicated thing",
"y": [1, 2, 3],
"huge_image": "..."
},
...
]
}
It seems straightforward that I might want to import this as two collections, parents
and children
. (Indeed, I may have to if the children are extremely large documents, such as media.) Yet I cannot find any information on how to efficiently import existing nested data into MongoDB as multiple collections.
mongoimport
takes only one collection
argument. One can certainly import the data into one collection, then manually construct the second collection from the first and modify each entry of the first, but this seems both labor-intensive and inefficient for what surely must be a common problem.
Is there something I'm missing here?