PySpark - Merge all Columns in Dataframe to one column col0 (to write as a fixed-width file)

Asked Jun 15 '23 at 06:18

Active Jun 15 '23 at 06:18

Viewed 26 times

I have a dataframe that has 7 cols. This was originally a fixed width txt file and I read it using Glue's GrokLog parser and parsed it into different columns using the widths of the columns.

Now I need to write the result dataframe back to S3 as a fixed-width file without any delimiter.

Is this the best way to do it? What is the best optimized Spark way of doing this?

Thanks!

I am currently using concatenation to concat all columns and merge them into one column.

errorDfConcatenated = errorDf.withColumn("col0",concat(*[errorDf[c] for c in       errorDf.columns])).select("col0")

errorDfConcatenated.write.format('csv').save(dest_s3_path_error, mode='overwrite')

asked Jun 15 '23 at 06:18

Samarth Navneet

PySpark - Merge all Columns in Dataframe to one column col0 (to write as a fixed-width file)

0 Answers0