1

I am using PySpark. I have a column that is a date datatype column and another column that is an integer datatype column.

See sample below:

date         subtract
2019-01-08   7
2019-01-04   2

I want to create a new column called "new_date" that subtracts the "subtract" column value from the "date" col.

Below is my desired output:

date         subtract  new_date   
2019-01-08   7         2019-01-01
2019-01-04   2         2019-01-02

I tried the code below:

df = df.withColumn('new_date', F.date_sub(df.date, df.subtract))

Below is the error I get: TypeError: 'Column' object is not callable

PineNuts0
  • 4,740
  • 21
  • 67
  • 112
  • Possible duplicate of [Using a column value as a parameter to a spark DataFrame function](https://stackoverflow.com/questions/51140470/using-a-column-value-as-a-parameter-to-a-spark-dataframe-function) – pault Jan 17 '19 at 18:52
  • 2
    See the linked dupe for details. In short: you can't use a column value as a parameter to a API function, but one workaround is to use `pyspark.sql.functions.expr` for example: `df = df.withColumn("new_date", F.expr("date_sub(date, subtract)"))` – pault Jan 17 '19 at 18:54

1 Answers1

1

Try this:

df.withColumn("new_date", F.expr("date_sub(date, subtract)"))
theletz
  • 1,713
  • 2
  • 16
  • 22