0

I'm doing a dataflow streaming app that would write in firestore, when I run on DirectRunner it works, but when I run on DataflowRunner it doesn't works. what is the error? my requirements.txt is:

apache-beam[gcp,test]==2.19.0
google-cloud-pubsub
google-cloud-firestore==0.29.0

the row of code that throws the exeption is:

from google.cloud import firestore
db = firestore.Client(project=project)
Giuseppe17
  • 55
  • 1
  • 5
  • I found a similar thread, is the answer on here? https://stackoverflow.com/questions/48264536/importerror-failed-to-import-the-cloud-firestore-library-for-python – caleb Kugel May 22 '20 at 19:40
  • yes, but nothing. it doesn't work. i see that if the import put in the function that uses firestore, it start, instead if I put it on top of module, it doesn't start. could it is an help for you? – Giuseppe17 May 22 '20 at 19:51
  • Could you share the code where you are doing the import please? – Soni Sol May 22 '20 at 23:21
  • yes, I have updated my post – Giuseppe17 May 23 '20 at 08:04

1 Answers1

1

This is documented in the Dataflow FAQ, there are few ways to handle this

  1. Use import statement inside the function definition
  2. Set save_main_session to True in the Pipeline Options
  3. Define the dependencies and organize your folder structure appropriately with requirements.txt and setup.py files

More details can be found here - https://cloud.google.com/dataflow/docs/resources/faq#how_do_i_handle_nameerrors

Jayadeep Jayaraman
  • 2,747
  • 3
  • 15
  • 26