I have a data in a csv as below, the first row is blank and the second row is filled for only the 4 columns as below,
201901 201902 201903 201904
A X 1 0 1 1
B Y 0 0 1 1
A Z 1 0 1 1
B X 1 0 1 1
A Y 0 0 0 1
B Z 1 0 0 1
A X 0 1 0 1
B Y 1 1 0 0
A Z 1 1 0 0
B X 0 1 1 0
If I read the data into csv i will get the data as below
_c1 _c2 _c3 _c4 _c5 _c6
null null null null null null
null null 201901 201902 201903 201904
A X 1 0 1 1
B Y 0 0 1 1
A Z 1 0 1 1
B X 1 0 1 1
A Y 0 0 0 1
B Z 1 0 0 1
A X 0 1 0 1
B Y 1 1 0 0
A Z 1 1 0 0
B X 0 1 1 0
I have read the datafile without header and removed the not required headers. Now I want to convert the files to have header
df=spark.read.csv("s3://abc/def/file.csv",header=False)
df=df.where(col("_c3").isNotNull())
Type Source 201901 201902 201903 201904
A X 1 0 1 1
B Y 0 0 1 1
A Z 1 0 1 1
B X 1 0 1 1
A Y 0 0 0 1
B Z 1 0 0 1
A X 0 1 0 1
B Y 1 1 0 0
A Z 1 1 0 0
B X 0 1 1 0