5

I would like to make POST request through a DoFn for a Apache Beam Pipeline running on Dataflow.

For that, I have created a client which instanciate an HttpClosableClient configured on a PoolingHttpClientConnectionManager.

However, I instanciate a client for each element that I process.

How could I setup a persistent client used by all my elements?

And is there other class for parallel and high-speed HTTP requests that I should use?

Pierre CORBEL
  • 713
  • 1
  • 6
  • 14

1 Answers1

5

You can put the client into a member variable, use the @Setup method to open it, and @Teardown to close it. Implementation of almost all IOs in Beam uses this pattern, e.g. see JdbcIO.

jkff
  • 17,623
  • 5
  • 53
  • 85
  • I believe the equivalent for python is start_bundle and finish_bundle. See https://beam.apache.org/documentation/sdks/pydoc/2.3.0/apache_beam.transforms.core.html#apache_beam.transforms.core.DoFn.start_bundle – Justin Mar 22 '18 at 04:14