1

I have a file on Google Cloud Storage that contains a number of queries(insert/update/delete/select). I need to do two things: 1) execute all queries 2) for select queries write the result to a file in GCS.

What is the most efficient way to do this in Apache Beam?

Thank You.

rish0097
  • 1,024
  • 2
  • 18
  • 39
  • Does the order of these queries matter? If not, is each query in a single line in the file in GCS? If yes; then you should be able to have a `ReadFromText` -> `ParDo(execute each query)` -> `WriteToText`. – Pablo Aug 24 '17 at 21:11
  • @Pablo The order will also matter and each query is not in a single line. And just for test purposes I was trying to read such a file, but the order in which the queries appeared was scrambled...like the queries were mixed into each other. How to handle this case? – rish0097 Aug 28 '17 at 10:08
  • @Pablo any update on this? – rish0097 Aug 29 '17 at 09:29
  • I am not sure how to implement this. Existing text sources split files into per-line elements, and have no guarantee on the order in which they come into your pipeline. You'd need a source which provides a list of files, which you can use to read files from GCS and parse+execute the queries. – Pablo Aug 29 '17 at 17:18
  • @rish0097 Were you able to implement a solution for this? If so it is recommended to post it as the answer to better help the community. If not than having a source that read ordered text using a different delimiter other than newline would be a good [feature request](https://cloud.google.com/support/docs/issue-trackers). – Jordan Sep 26 '17 at 13:58
  • Hi @Jordan yes I was able to implement this. You might want to have a look at this - https://stackoverflow.com/questions/45920895/read-a-file-from-gcs-in-apache-beam. So basically I read the file using FileSystems API and iterated over the queries and executed them using BigQuery ClientLibraries. As for reading a text file using a different delimiter other than newline is something that I had asked for a few weeks ago. You also might want to have a look at this - https://stackoverflow.com/questions/45939382/pick-elements-in-processelement-apache-beam. – rish0097 Sep 27 '17 at 04:47
  • Possible duplicate of [Read a file from GCS in Apache Beam](https://stackoverflow.com/questions/45920895/read-a-file-from-gcs-in-apache-beam) – Jordan Sep 27 '17 at 15:18

0 Answers0