Python PySpark: Subtract Integer Column from Date Column Error: Column Object not Callable

Question

I am using PySpark. I have a column that is a date datatype column and another column that is an integer datatype column.

See sample below:

date         subtract
2019-01-08   7
2019-01-04   2

I want to create a new column called "new_date" that subtracts the "subtract" column value from the "date" col.

Below is my desired output:

date         subtract  new_date   
2019-01-08   7         2019-01-01
2019-01-04   2         2019-01-02

I tried the code below:

df = df.withColumn('new_date', F.date_sub(df.date, df.subtract))

Below is the error I get: TypeError: 'Column' object is not callable

Possible duplicate of [Using a column value as a parameter to a spark DataFrame function](https://stackoverflow.com/questions/51140470/using-a-column-value-as-a-parameter-to-a-spark-dataframe-function) — pault, Jan 17 '19 at 18:52
See the linked dupe for details. In short: you can't use a column value as a parameter to a API function, but one workaround is to use `pyspark.sql.functions.expr` for example: `df = df.withColumn("new_date", F.expr("date_sub(date, subtract)"))` — pault, Jan 17 '19 at 18:54

score 1 · Answer 1 · edited Sep 10 '20 at 08:53

1

Try this:

df.withColumn("new_date", F.expr("date_sub(date, subtract)"))

edited Sep 10 '20 at 08:53

theletz

answered Sep 10 '20 at 07:12

Neetha Mary Paul

1 Answers1