3

I have a collection data in Google cloud Firestore. This collection has over 200K documents. I want to export each document as a line to a file.

I created a script which is working fine for 50K rows. After that its crashing with following exception. How can I get all documents?

I saw something called offset, but not sure its helps in my situation.

Code Snippet:

from google.cloud import firestore
import os

os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "key.json"


db = firestore.Client()
col = db.collection(u'data')
docs = col.get()

with open('data.bak', 'a') as f:
    for doc in docs:
        f.write(u'{} => {}'.format(doc.id, doc.to_dict()))
        f.write('\n')

Exception:

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "down_db.py", line 13, in <module>
    for doc in docs:
  File "/usr/local/lib/python3.6/dist-packages/google/cloud/firestore_v1beta1/query.py", line 744, in get
    for index, response_pb in enumerate(response_iterator):
  File "/usr/local/lib/python3.6/dist-packages/google/api_core/grpc_helpers.py", line 81, in next
    six.raise_from(exceptions.from_grpc_error(exc), exc)
  File "<string>", line 3, in raise_from
google.api_core.exceptions.ServiceUnavailable: 503 The datastore operation timed out, or the data was temporarily unavailable.
James Z
  • 12,209
  • 10
  • 24
  • 44
RMK
  • 1,111
  • 4
  • 14
  • 33
  • How about loading data in smaller chunks? – Alex Mamo Jan 28 '19 at 11:32
  • I am not inserting data to Firestore,but downloading it. I don't have control over limiting data or where clauses. – RMK Jan 28 '19 at 11:45
  • That's also what I'm talking about, downloading data in smaller chunks. Oh, you have. Check [this](https://firebase.google.com/docs/firestore/query-data/get-data) out. – Alex Mamo Jan 28 '19 at 11:48
  • Thanks @AlexMamo, I will try that. Later I will try [query cursors](https://cloud.google.com/firestore/docs/query-data/query-cursors) – RMK Jan 28 '19 at 11:53
  • Possible duplicate of [Why aren't all my documents updated in Firestore?](https://stackoverflow.com/questions/52360438/why-arent-all-my-documents-updated-in-firestore) – Juan Lara Feb 04 '19 at 18:30

3 Answers3

3

The Cloud Firestore python client has a 20 second timeout for get(). Try breaking up the work or try fetching all the document references and then iterating.

docs = [snapshot.reference for snapshot in col.get()]
for doc in docs:
        ...

Github issue regarding timeout

Juan Lara
  • 6,454
  • 1
  • 22
  • 31
2

there's another approach which I think would work using gcloud command line tool, this will require you to use the Bucket storage and BigQuery, both are pretty easy to get going.

  1. Export the collections using the gcloud firetore export function in the terminal:
gcloud beta firestore export gs://[BUCKET_NAME] --collection-ids=[COLLECTION_ID_1],[COLLECTION_ID_2]

Your whole collections will be exported to a GCS Bucket, the data format is the same as Cloud Datastore thous readable through BigQuery so..

  1. Load the data from GCS Bucket to Bigquery, the exported Firestore Collection will live as a table in BigQuery

  2. Query the table form BigQuery with something like select * from [TABLE_NAME], BigQuery then has an option to download the query result as CSV

Linh
  • 418
  • 4
  • 9
0

I created a script which is working fine for 50K rows.

That limit is precisely the number of documents that you can read on a project on the free/Spark plan of Firebase. If your project is on the free plan, you'll need to upgrade it to read more documents per day.

Frank van Puffelen
  • 565,676
  • 79
  • 828
  • 807
  • Thanks @Frank van Puffelen. I am using Firestore form Google cloud project. Which has $300 credits and I guess the row limts does not apply on GCP projects. – RMK Jan 29 '19 at 14:07