How to dump a collection to json file using pymongo

Question

I am trying to dump a collection to .json file but after looking in pymongo tutorial I can not find any thing that relates to it.

Tutorial link: https://api.mongodb.com/python/current/tutorial.html

https://docs.mongodb.com/manual/reference/program/mongoexport/ — Alex Blex, Mar 07 '18 at 13:33
Does this answer your question? [PyMongo/Mongoengine equivalent of mongodump](https://stackoverflow.com/questions/24610484/pymongo-mongoengine-equivalent-of-mongodump) — Phoenix, Dec 14 '19 at 09:59

score 16 · Accepted Answer · edited Nov 06 '20 at 19:39

16

Just get all documents and save them to file e.g.:

from bson.json_util import dumps
from pymongo import MongoClient

if __name__ == '__main__':
    client = MongoClient()
    db = client.db_name
    collection = db.collection_name
    cursor = collection.find({})
    with open('collection.json', 'w') as file:
        file.write('[')
        for document in cursor:
            file.write(dumps(document))
            file.write(',')
        file.write(']')

edited Nov 06 '20 at 19:39

swateek

6,735
8
34
48

answered Mar 07 '18 at 13:43

kamillitw

454
3
6

How to save them to file exactly? – AnhNg Mar 07 '18 at 14:52
@AnhNg I've added an example, take a look. – kamillitw Mar 08 '18 at 08:52
13

This gives errors like this: `TypeError: Object of type 'ObjectId' is not JSON serializable` – JCGB Sep 10 '19 at 10:35
3

Got the same `TypeError`. You can solve it by replacing `file.write(json.dumps(document))` by importing `from bson.json_util import dumps` and replacing the line by `file.write(dumps(document))` [Learn more](https://stackoverflow.com/questions/16586180/typeerror-objectid-is-not-json-serializable) – darorck May 05 '20 at 22:21
5

This actually produces an invalid JSON because of the last `file.write(',')` before `file.write(']')` will result in the of file `,]` which is invalid. – garyj Oct 02 '21 at 02:55

score 16 · Answer 2 · answered Oct 02 '21 at 03:03

The accepted solution produces an invalid JSON. It results in trailing comma , before the close square bracket ]. The JSON spec does not allow trailing commas. See this answer and this reference.

To build on the accepted solution I used the following:

from bson.json_util import dumps
from pymongo import MongoClient
import json

if __name__ == '__main__':
    client = MongoClient()
    db = client.db_name
    collection = db.collection_name
    cursor = collection.find({})
    with open('collection.json', 'w') as file:
        json.dump(json.loads(dumps(cursor)), file)

This is the best solution in my opinion – William Le Jan 08 '22 at 04:13 — William Le, Jan 08 '22 at 04:13

score 2 · Answer 3 · answered Jun 27 '19 at 12:55

Here's another way of not saving a , before the closing square brackets. Also using with open to save some space.

filter = {"type": "something"}
type_documents = db['cluster'].find(filter)
type_documents_count = db['cluster'].count_documents(filter)

with open("type_documents.json", "w") as file:
    file.write('[')
    # Start from one as type_documents_count also starts from 1.
    for i, document in enumerate(type_documents, 1):
        file.write(json.dumps(document, default=str))
        if i != type_documents_count:
            file.write(',')
    file.write(']')

It basically doesn't write the comma if number of iterations is equal to the number of documents (which is the last document it saves).

Wouldn't it be smarter to add the comma before the dump instead of after? This way you don't need the total count of documents, you can just check if it's the first iteration. Replace `if i != type_documents_count:` with `if != 1:` and get rid of the count documents line. — rabbibillclinton, Aug 13 '23 at 16:11

score 1 · Answer 4 · edited Aug 04 '20 at 14:19

Complementing @kamilitw I use length of cursor to make a JSON file correctly. I use count() and if-else:

def writeToJSONFile(collection):
    cursor = collection.find({})
    file = open("collection.json", "w")
    file.write('[')
    qnt_cursor = 0
    for document in cursor:
        qnt_cursor += 1
        num_max = cursor.count()
        if (num_max == 1):
            file.write(json.dumps(document, indent=4, default=json_util.default))
        elif (num_max >= 1 and qnt_cursor <= num_max-1):
            file.write(json.dumps(document, indent=4, default=json_util.default))
            file.write(',')
        elif (qnt_cursor == num_max):
            file.write(json.dumps(document, indent=4, default=json_util.default))
    file.write(']')
    return file

So the JSON file will be correct in the and, because before as writing like this: [{"test": "test"},], now it's writing: [{"test":"test1"},{"test":"test2"}]

score 1 · Answer 5 · answered Feb 08 '21 at 06:09

"""
@Author: Aseem Jain
@profile: https://www.linkedin.com/in/premaseem/

"""
import os
import pymongo

# configure credentials / db name
db_user = os.environ["MONGO_ATLAS_USER"]
db_pass = os.environ["MONGO_ATLAS_PASSWORD"]
db_name = "sample_mflix"

connection_string = f"mongodb+srv://{db_user}:{db_pass}@sharedcluster.lv3wx.mongodb.net/{db_name}?retryWrites=true&w=majority"

client = pymongo.MongoClient(connection_string)
db = client[db_name]

# create database back directory with db_name
os.makedirs(db_name, exist_ok=True)

# list all tables in database
tables = db.list_collection_names()

# dump all tables in db
for table in tables:
    print("exporting data for table", table )
    data = list(db[table].find())
    # write data in json file
    with open(f"{db.name}/{table}.json","w") as writer:
        writer.write(str(data))

exit(0)

nice - I printed with `import pprint; pprint.pformat(data)` and output looked good — Jose Alban, Apr 05 '23 at 12:36

score 0 · Answer 6 · answered Jun 14 '22 at 14:36

Using pymongo's json_util:

from bson.json_util import dumps
from pymongo import MongoClient
import json

db_client = MongoClient(mongo_connection_string)
collections = db.collection_name
for collectio in collections.find():
    with open("collection.json", 'w') as file:
        op_json = dumps(operation)
        json.dump(op_json, file)

score 0 · Answer 7 · answered Aug 13 '23 at 16:17

I liked @robscott's answer as it seemed the most intuitive while also not creating an invalid JSON. Here is a simplified version of that, as it requires no document count. Instead of adding the comma after the dump, it just adds it after.

The idea is the same though, as it adds every comma but the first.

filter = {"type": "something"}
type_documents = db['cluster'].find(filter)

with open("type_documents.json", "w") as file:
    file.write('[')
    for i, document in enumerate(type_documents, 1):
        if i != 1:
            file.write(',')
        file.write(json.dumps(document, default=str))
    file.write(']')

How to dump a collection to json file using pymongo

7 Answers7

Linked