I am new to PySpark and was wondering how you would use method chaining there. In pandas I would use assign with lambda, so for example
df = pd.DataFrame({'number':[1,2,3],'date':['31-dec-19','02-jan-18','14-mar-20']})
df = (df.assign(number_plus_one = lambda x: x.number + 1)
.assign(date = lambda x: pd.to_datetime(x.date))
.loc[lambda x: x.number_plus_one.isin([2,3])]
.drop(columns=['number','number_plus_one'])
)
How would you write the same code in PySpark without converting it to a pandas dataframe? I guess you could use filter, withColumn and drop, but how exactly would you do it with method chaining?