This post does a great job of showing how parse a fixed width text file into a Spark dataframe with pyspark (pyspark parse text file).
I have several text files I want to parse, but they each have slightly different schemas. Rather than having to write out the same procedure for each one like the previous post suggests, I'd like to write a generic function that can parse a fixed width text file given the widths and column names.
I'm pretty new to pyspark so I'm not sure how to write a select statement where the number of columns, and their types is variable.
Any help would be appreciated!