I'm trying to read multiple csv files. Why is this code returning <_ReadFromPandas(PTransform) label=[_ReadFromPandas]>
? Here's the read_csv code: https://beam.apache.org/releases/pydoc/2.25.0/_modules/apache_beam/dataframe/io.html
pcol_of_dfs = (p
| 'Match files' >> beam.io.fileio.MatchFiles(path)
| 'Read Files' >> beam.Map(lambda file_meta: beam.dataframe.io.read_csv(file_meta.path))
)
Ultimately I want to read all csv files and append the file names as additional column.
I have several hundreds of gzipped csv files in a GCS bucket. All of them have identical set of columns. All have headers. The csv values may contain line breaks. Files vary in size from a few kb to ~5GB.