pyspark - 'DataFrame' object has no attribute 'map'

Question

I have the following summary for dataset, using pyspark on databricks

OrderMonthYear	SaleAmount
2012-11-01T00:00:00.000+0000	473760.5700000001
2010-04-01T00:00:00.000+0000	490967.0900000001

I'm having dataframe error for this map function to convert OrderMonthYear into integer type

results = summary.map(lambda r: (int(r.OrderMonthYear.replace('-','')), r.SaleAmount)).toDF(["OrderMonthYear","SaleAmount"])

any ideas?

AttributeError: 'DataFrame' object has no attribute 'map'

you can't convert that to an integer because there are strings that you didn't replace (T, +, :) — mck, Apr 07 '21 at 16:35
hey, thx for reply, the column is a timestamp.. not string DataFrame[OrderMonthYear: timestamp] — Tanai Goncalves, Apr 07 '21 at 17:05
got it. even when I try to use datetime functions doesn't work. ..... test = summary.select("OrderMonthYear").apply(lambda x: x.strftime('%d%m%Y')) ..... 'DataFrame' object has no attribute 'apply' .... I guess my sql call is confusing the dataframe structure? .. . data = sqlContext.read.format("csv") — Tanai Goncalves, Apr 07 '21 at 17:54

Tanai Goncalves · Accepted Answer · 2021-04-07T18:40:38.847

Found a solution here Pyspark date yyyy-mmm-dd conversion

from datetime import datetime
from pyspark.sql.functions import col, unix_timestamp, from_unixtime, date_format
from pyspark.sql.types import DateType

df = summary.withColumn('date', from_unixtime(unix_timestamp("OrderMonthYear", 'yyyy-MMM')))


df2 = df.withColumn("new_date_str", date_format(col("date"), "yyyyMMdd"))
display(df2)

thank you @mck for the help! cheers

pyspark - 'DataFrame' object has no attribute 'map'

1 Answers1