I have a dataframe with some columns:
+------------+--------+----------+----------+
|country_name| ID_user|birth_date| psdt|
+------------+--------+----------+----------+
| Россия|16460783| 486|1970-01-01|
| Россия|16467391| 4669|1970-01-01|
| Россия|16467889| 6861|1970-01-01|
| Казахстан|16468013| 5360|1970-01-01|
| Россия|16471027| 6311|1970-01-01|
| Россия|16474162| 5567|1970-01-01|
| Россия|16476386| 4351|1970-01-01|
| Россия|16481067| 3831|1970-01-01|
| Казахстан|16485965| -2369|1970-01-01|
| Германия|16486027| 5864|1970-01-01|
+------------+--------+----------+----------+
only showing top 10 rows
I need to add "psdt" with "birth_date". I wrote this code, but (sf.date_add) doesn't work:
resultbirthDF =(
resultDF
.select(sf.col("country_name"),
sf.col("ID_user"),
sf.col("birth_date"),
sf.lit(past_datetr).alias("psdt")
)
.withColumn("birth_datetrue",sf.date_add(sf.to_date(sf.col("psdt")),sf.col("birth_date")))
).show(10)
'Column' object is not callable
Traceback (most recent call last):
File "/volumes/disk1/yarn/local/usercache/livy/appcache/application_1573843665329_0786/container_e05_1573843665329_0786_01_000001/pyspark.zip/pyspark/sql/functions.py", line 1006, in date_add
return Column(sc._jvm.functions.date_add(_to_java_column(start), days))
How to solve this problem?