I have a CSV file which I want to read into DataFrame
Here is an example of my file (last column may contains string with spaces):
C1 C2 C3 1 2 ab cd 11 12 xx yz 5 6 mm nn pl
I tried to read this file using:
spark.read.csv("myFile",header=True, mode="DROPMALFORMED",sep=' ')
But It fails (all rows are malformed)
In order to succeed on reading this file, I need to update it first (remove spaces, add underscores,etc..):
C1 C2 C3 1 2 ab_cd 11 12 xx_yz 5 6 mm_nn_pl
Is there a way to read the file into CSV without changing it?
I also tried to use the attributes ignoreLeadingWhiteSpace and ignoreTrailingWhiteSpace without success.
spark.read.csv("myFile",header=True, mode="DROPMALFORMED",sep=' ', ignoreLeadingWhiteSpace=True, ignoreTrailingWhiteSpace=True)
Thanks for the help