0

I am trying to loop through each column of a date frame based on conditions and pass each column name iteration into the expression.

The problem, with the below is that pyspark thinks that columnName is the name of a column and not a variable which corresponds to a column. How can I do this?

df_excelDate= df1.withColumn('excelDate',  expr("case when columnName>0 AND columnName< datediff(current_timestamp(),to_date('1899-12-31', 'yyyy-MM-dd')) then True when columnName IS NULL  then NULL else False end"))
Grzegorz Skibinski
  • 12,624
  • 2
  • 11
  • 34
Tiger_Stripes
  • 485
  • 5
  • 17
  • just use format(): `expr("case when {0}>0 AND {0} <......".format(columnName))` – jxc Jan 03 '20 at 16:41
  • @jxc I've never used format. Is there a link to some documentation on the web or can you give a simple example? or is that how it should look? ( e.g ```expr("{0} + 1 > 2*{0)".format(variableName))``` would be a variable +1 > 2 times the variable in the expression – Tiger_Stripes Jan 03 '20 at 17:06
  • check https://docs.python.org/3.4/library/string.html#format-examples, this is basically a pure Python question on how to format a string which you can use in expr() function. – jxc Jan 03 '20 at 17:43

0 Answers0