0

I'm currently trying to send data to a azure document db collection on python (using pydocumentdb lib). Actually i have to send about 100 000 document on this collection and this takes a very long time (about 2 hours).

I send each document one by one using :

for document in documents :
    client.CreateDocument(collection_link, document)

Am i doing wrong, is there another faster way to do it or it's just normal that it takes so long.

Thanks !

Gary Liu
  • 13,758
  • 1
  • 17
  • 32
  • I dont think this is anything such as batch insert. There is a way to do something that you wish in dotnet. Look into this answer! https://stackoverflow.com/questions/41744582/fastest-way-to-insert-100-000-records-into-documentdb – crazyglasses Jun 22 '17 at 10:36
  • My guess is that the CosmosDB Python SDK operations are all synchronous. This means that one call to `client.CreateDocument()` must complete its full round trip before it will go to the next document in the loop. This is incredibly inefficient. You need to get more parallelism or bigger batches in your round trips. Not sure how you do the former in Python, but the latter can be accomplished by using a stored procedure where you send in an array of JSON documents (not all 100,000, but maybe 1,000 at a time) as input to the sproc. – Larry Maccherone Jun 22 '17 at 14:05
  • Another option is to bypass the CosmosDB Python SDK and make REST calls directly. Here's how you make a batch of parallel requests: https://stackoverflow.com/questions/9110593/asynchronous-requests-with-python-requests. The difficulty with this approach is usually composing the authentication token but you may be able to extract that from the Python SDK or find another SO answer that explains this. – Larry Maccherone Jun 22 '17 at 14:28

1 Answers1

1

On Azure, there are many ways to help importing data to CosmosDB faster than using PyDocumentDB API which be wrappered the related REST APIs via HTTP.

First, to be ready a json file includes your 10,000 documents for importing, then you can follow the documents below to import data.

  1. Refer to the document How to import data into Azure Cosmos DB for the DocumentDB API? to import json data file via DocumentDB Data Migration Tool.
  2. Refer to the document Azure Cosmos DB: How to import MongoDB data? to import json data file via the mongoimport tool of MongoDB.
  3. Upload the json data file to Azure Blob Storage, then to copy data using Azure Data Factory from Blob Storage to CosmosDB, please see the section Example: Copy data from Azure Blob to Azure Cosmos DB to know more details.

If you just want to import data in programming, you can try to use Python MongoDB driver to connect Azure CosmosDB to import data via MongoDB wire protocol, please refer to the document Introduction to Azure Cosmos DB: API for MongoDB.

Hope it helps.

Peter Pan
  • 23,476
  • 4
  • 25
  • 43