I have let's say education level as a column. It's a string. How would you convert this to a categorical type of variable? Is this necessary in pyspark, because in pandas, I'm told that categorical data is much faster to process.
df = df.withColumn("BIRTHDAY", df['BIRTHDAY'].cast(DateType()))
That's how I'd do a string
to a date
.