I am trying to loop through each column of a date frame based on conditions and pass each column name iteration into the expression.
The problem, with the below is that pyspark thinks that columnName
is the name of a column and not a variable which corresponds to a column. How can I do this?
df_excelDate= df1.withColumn('excelDate', expr("case when columnName>0 AND columnName< datediff(current_timestamp(),to_date('1899-12-31', 'yyyy-MM-dd')) then True when columnName IS NULL then NULL else False end"))