On a pipeline defined using the latest Apache Beam SDK for Python 2.2.0, I get this error when running a simple pipeline that reads and writes a BigQuery table.
Since a few rows have timestamps with year < 1900, the read operation fails. How can I patch this dataflow_worker package?
apache_beam.runners.dataflow.dataflow_runner.DataflowRuntimeException: Dataflow pipeline failed. State: FAILED, Error:
(4d31192aa4aec063): Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 582, in do_work
work_executor.execute()
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py", line 167, in execute
op.start()
File "dataflow_worker/native_operations.py", line 38, in dataflow_worker.native_operations.NativeReadOperation.start
def start(self):
File "dataflow_worker/native_operations.py", line 39, in dataflow_worker.native_operations.NativeReadOperation.start
with self.scoped_start_state:
File "dataflow_worker/native_operations.py", line 44, in dataflow_worker.native_operations.NativeReadOperation.start
with self.spec.source.reader() as reader:
File "dataflow_worker/native_operations.py", line 48, in dataflow_worker.native_operations.NativeReadOperation.start
for value in reader:
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/nativefileio.py", line 198, in __iter__
for record in self.read_next_block():
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/nativeavroio.py", line 95, in read_next_block
yield self.decode_record(record)
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/nativebigqueryavroio.py", line 110, in decode_record
record, self.source.table_schema)
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/nativebigqueryavroio.py", line 104, in _fix_field_values
record[field.name], field)
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/nativebigqueryavroio.py", line 83, in _fix_field_value
return dt.strftime('%Y-%m-%d %H:%M:%S.%f UTC')
ValueError: year=200 is before 1900; the datetime strftime() methods require year >= 1900