16

I am trying to dump a collection to .json file but after looking in pymongo tutorial I can not find any thing that relates to it.

Tutorial link: https://api.mongodb.com/python/current/tutorial.html

Dharman
  • 30,962
  • 25
  • 85
  • 135
AnhNg
  • 179
  • 1
  • 1
  • 7
  • https://docs.mongodb.com/manual/reference/program/mongoexport/ – Alex Blex Mar 07 '18 at 13:33
  • Does this answer your question? [PyMongo/Mongoengine equivalent of mongodump](https://stackoverflow.com/questions/24610484/pymongo-mongoengine-equivalent-of-mongodump) – Phoenix Dec 14 '19 at 09:59

7 Answers7

16

Just get all documents and save them to file e.g.:

from bson.json_util import dumps
from pymongo import MongoClient

if __name__ == '__main__':
    client = MongoClient()
    db = client.db_name
    collection = db.collection_name
    cursor = collection.find({})
    with open('collection.json', 'w') as file:
        file.write('[')
        for document in cursor:
            file.write(dumps(document))
            file.write(',')
        file.write(']')
swateek
  • 6,735
  • 8
  • 34
  • 48
kamillitw
  • 454
  • 3
  • 6
  • How to save them to file exactly? – AnhNg Mar 07 '18 at 14:52
  • @AnhNg I've added an example, take a look. – kamillitw Mar 08 '18 at 08:52
  • 13
    This gives errors like this: `TypeError: Object of type 'ObjectId' is not JSON serializable` – JCGB Sep 10 '19 at 10:35
  • 3
    Got the same `TypeError`. You can solve it by replacing `file.write(json.dumps(document))` by importing `from bson.json_util import dumps` and replacing the line by `file.write(dumps(document))` [Learn more](https://stackoverflow.com/questions/16586180/typeerror-objectid-is-not-json-serializable) – darorck May 05 '20 at 22:21
  • 5
    This actually produces an invalid JSON because of the last `file.write(',')` before `file.write(']')` will result in the of file `,]` which is invalid. – garyj Oct 02 '21 at 02:55
16

The accepted solution produces an invalid JSON. It results in trailing comma , before the close square bracket ]. The JSON spec does not allow trailing commas. See this answer and this reference.

To build on the accepted solution I used the following:

from bson.json_util import dumps
from pymongo import MongoClient
import json

if __name__ == '__main__':
    client = MongoClient()
    db = client.db_name
    collection = db.collection_name
    cursor = collection.find({})
    with open('collection.json', 'w') as file:
        json.dump(json.loads(dumps(cursor)), file)
garyj
  • 1,302
  • 2
  • 13
  • 22
2

Here's another way of not saving a , before the closing square brackets. Also using with open to save some space.

filter = {"type": "something"}
type_documents = db['cluster'].find(filter)
type_documents_count = db['cluster'].count_documents(filter)

with open("type_documents.json", "w") as file:
    file.write('[')
    # Start from one as type_documents_count also starts from 1.
    for i, document in enumerate(type_documents, 1):
        file.write(json.dumps(document, default=str))
        if i != type_documents_count:
            file.write(',')
    file.write(']')

It basically doesn't write the comma if number of iterations is equal to the number of documents (which is the last document it saves).

robscott
  • 91
  • 7
  • Wouldn't it be smarter to add the comma before the dump instead of after? This way you don't need the total count of documents, you can just check if it's the first iteration. Replace `if i != type_documents_count:` with `if != 1:` and get rid of the count documents line. – rabbibillclinton Aug 13 '23 at 16:11
1

Complementing @kamilitw I use length of cursor to make a JSON file correctly. I use count() and if-else:

def writeToJSONFile(collection):
    cursor = collection.find({})
    file = open("collection.json", "w")
    file.write('[')
    qnt_cursor = 0
    for document in cursor:
        qnt_cursor += 1
        num_max = cursor.count()
        if (num_max == 1):
            file.write(json.dumps(document, indent=4, default=json_util.default))
        elif (num_max >= 1 and qnt_cursor <= num_max-1):
            file.write(json.dumps(document, indent=4, default=json_util.default))
            file.write(',')
        elif (qnt_cursor == num_max):
            file.write(json.dumps(document, indent=4, default=json_util.default))
    file.write(']')
    return file

So the JSON file will be correct in the and, because before as writing like this: [{"test": "test"},], now it's writing: [{"test":"test1"},{"test":"test2"}]

Z4-tier
  • 7,287
  • 3
  • 26
  • 42
Naiara Andrade
  • 137
  • 1
  • 5
1
"""
@Author: Aseem Jain
@profile: https://www.linkedin.com/in/premaseem/

"""
import os
import pymongo

# configure credentials / db name
db_user = os.environ["MONGO_ATLAS_USER"]
db_pass = os.environ["MONGO_ATLAS_PASSWORD"]
db_name = "sample_mflix"

connection_string = f"mongodb+srv://{db_user}:{db_pass}@sharedcluster.lv3wx.mongodb.net/{db_name}?retryWrites=true&w=majority"

client = pymongo.MongoClient(connection_string)
db = client[db_name]

# create database back directory with db_name
os.makedirs(db_name, exist_ok=True)

# list all tables in database
tables = db.list_collection_names()

# dump all tables in db
for table in tables:
    print("exporting data for table", table )
    data = list(db[table].find())
    # write data in json file
    with open(f"{db.name}/{table}.json","w") as writer:
        writer.write(str(data))

exit(0)
Aseem Jain
  • 333
  • 2
  • 7
0

Using pymongo's json_util:

from bson.json_util import dumps
from pymongo import MongoClient
import json

db_client = MongoClient(mongo_connection_string)
collections = db.collection_name
for collectio in collections.find():
    with open("collection.json", 'w') as file:
        op_json = dumps(operation)
        json.dump(op_json, file)
Livne Rosenblum
  • 196
  • 1
  • 12
0

I liked @robscott's answer as it seemed the most intuitive while also not creating an invalid JSON. Here is a simplified version of that, as it requires no document count. Instead of adding the comma after the dump, it just adds it after.

The idea is the same though, as it adds every comma but the first.

filter = {"type": "something"}
type_documents = db['cluster'].find(filter)

with open("type_documents.json", "w") as file:
    file.write('[')
    for i, document in enumerate(type_documents, 1):
        if i != 1:
            file.write(',')
        file.write(json.dumps(document, default=str))
    file.write(']')
rabbibillclinton
  • 410
  • 4
  • 17