Has anything changed recently with the way google dataflow reads compressed files from google cloud storage? I am working on a project that reads compressed csv log files from GCS and uses these files as the source for a dataflow pipeline. Until recently this worked perfectly with and without specifying the compression type of the file.
Currently the processElement method in my DoFn is only called once (for the csv header row) although the file has many rows. If I use the same source file uncompressed then everything works as expected (the processElement method is called for every row). As suggested here https://stackoverflow.com/a/27775968/6142412 setting the Content-Encoding to gzip does work but I did not have to do this previously.
I am expereincing this issue when using DirectPipelineRunner or DataflowPipelineRunner. I am using version 1.5.0 of the cloud-data-flow sdk.