I tried..
1. read csv (columns: A, B, C.A1.T1, C.A1.T2, C.A2.T1, C.A2.T2, ...)
2. add columns: A1, B1
3. save parquet
4. read parquet
5. drop columns: A, B
6. rename column: A1 to A
7. rename column: B1 to B
8. select: A, B, C.A1.T1, C.A1.T2, C.A2.T1, C.A2.T2, ...
9. !! error
error
pyspark.sql.utils.AnalysisException: cannot resolve '`C.A1.T1`' given input columns: [A, B, C.A1.T1, C.A1.T2, C.A2.T1, C.A2.T2, ...]
'Project [A#1, B#221, 'C.A1.T1, 'C.A1.T2, 'C.A2.T1, 'C.A2.T2, ... 46 more fields]
+- Project [A#1, B#4, C.A1.T1#5, C.A1.T2#6, ... 47 more fields]
+- Project [A#1, B#4, C.A1.T1#5, C.A1.T2#6, ... 47 more fields]
+- Project ...
...
+- Relation[A#0,B#1,C.A1.T1#5,C.A1.T2#6,... 50 more fields] parquet
Why is there a single quotation mark to the left of the column name?
How can I fix it?
** No single quotation mark in anywhere: csv, printSchema(), str(df)
** df.select("'C.A1.T1").show() -> cannot resolve '`'C.A1.T1`'...