Subtracting an int column from a date column with date_add in pyspark

Question

I have a pyspark dataframe df

+------------+------+
|  timestamp | days |
+------------+------+
| 2019-11-21 |    5 |
| 2019-10-22 |   21 |
|        ... |  ... |
+------------+------+

I want to subtract the days from the timestamp with

import pyspark.sql.functions as F

df.withColumn("timestamp", F.date_add(F.col("timestamp"), -F.col("days")))

Expected result would be

+------------+------+
|  timestamp | days |
+------------+------+
| 2019-11-16 |    5 |
| 2019-10-01 |   21 |
|        ... |  ... |
+------------+------+

But I only get an error TypeError: Column is not iterable

Is there a way to get this to work?

Specifically: `df.withColumn("timestamp", F.expr("date_add(timestamp, -days)")` — pault, Nov 21 '19 at 21:12
it's the same as the duplicate I linked. You can accept the duplicate and close it yourself. — pault, Nov 21 '19 at 21:47

score 1 · Answer 1 · answered Nov 21 '19 at 20:20

1

Using a udf was the solution.

date_add_udf = F.udf(lambda date, days: F.date_add(date, days), pyspark.sql.types.TimestampType())

And then calling the it

df.withColumn("timestamp", date_add_udf(F.col("timestamp"), -F.col("days")))

answered Nov 21 '19 at 20:20

jho

725
1
6
12

When I try this method, I get an error like in [this post](https://stackoverflow.com/questions/53751266/attributeerror-nonetype-object-has-no-attribute-jvm-pyspark-udf). It looks like pyspark.sql.functions are not allowed inside of UDFs. Is this particular to Databricks? – Rachel Oct 19 '21 at 18:51
Just found [an alternate answer](https://newbedev.com/how-to-subtract-a-column-of-days-from-a-column-of-dates-in-pyspark). You use expr() instead of a UDF. – Rachel Oct 19 '21 at 18:55

Subtracting an int column from a date column with date_add in pyspark

1 Answers1