1

I have a data frame which has the below column:

Last Login- Date & Time(Incl. Time Zone)

When I read the data and print the schema, the column gets printed
df.printSchema()

enter image description here

But when I try selecting the column from the data frame it fails.

df.select(col("Last Login- Date & Time(Incl. Time Zone)"))

AnalysisException: cannot resolve '`Last Login- Date & Time(Incl. Time > Zone)`' 
given input columns: [`Last Login- Date & Time(Incl. Time Zone)`]
ZygD
  • 22,092
  • 39
  • 79
  • 102
Jim Macaulay
  • 4,709
  • 4
  • 28
  • 53

2 Answers2

0

Try by replacing backquotes(`) with _.

Example:

from pyspark.sql.functions import *
df = spark.createDataFrame([('1',)],['`Last Login- Date & Time(Incl. Time Zone)`'])
df = df.toDF(*(c.replace('`', '_') for c in df.columns))
df.selectExpr("`_Last Login- Date & Time(Incl. Time Zone)_`").show()
#+------------------------------------------+
#|_Last Login- Date & Time(Incl. Time Zone)_|
#+------------------------------------------+
#|                                         1|
#+------------------------------------------+
notNull
  • 30,258
  • 4
  • 35
  • 50
  • There is no backtick in the column from the source ```Last Login- Date & Time(Incl. Time Zone)```. When it is converted to data data frame, spark by default includes the backtick as there is a special character dot(.) in the column name. Back tick is generated by spark. If you see actual column does not have back tick but printSchema() result does – Jim Macaulay Aug 04 '23 at 15:20
0

As can be seen in the screenshot, your column name is surrounded with backticks `. If this is not intentional, you may want to remove the backticks. On the other hand, when selecting the column, you should use triple backticks for every backtick in the column name:

from pyspark.sql import functions as F
df = spark.range(1).toDF('`Last Login- Date & Time(Incl. Time Zone)`')
df.printSchema()
# root
#  |-- `Last Login- Date & Time(Incl. Time Zone)`: long (nullable = false)

df.select(F.col("```Last Login- Date & Time(Incl. Time Zone)```")).show()
# +------------------------------------------+
# |`Last Login- Date & Time(Incl. Time Zone)`|
# +------------------------------------------+
# |                                         0|
# +------------------------------------------+
ZygD
  • 22,092
  • 39
  • 79
  • 102
  • There is no backtick in the column from the source ```Last Login- Date & Time(Incl. Time Zone)```. When it is converted to data data frame, spark by default includes the backtick as there is a special character dot(.) in the column name. Back tick is generated by spark. If you see actual column does not have back tick but printSchema() result does – Jim Macaulay Aug 04 '23 at 15:20
  • I have created the column without backticks. And `printSchema` doesn't anymore show backticks, even though the name still contains the dot. Are you sure your column name has no backticks?.. Which Spark version are you using? – ZygD Aug 04 '23 at 15:24
  • Not brackets, the special character is dot . – Jim Macaulay Aug 04 '23 at 15:27
  • I did not mention _brackets_. And `printSchema` does not include backticks by default. The dot is not very special, I have successfully created the column with the dot. Please specify your Spark version, then I will be able to test everything in that version. – ZygD Aug 04 '23 at 15:29
  • Have you tried what I have suggested? If it did not work, what error did you get? – ZygD Aug 04 '23 at 15:32