1

I have let's say education level as a column. It's a string. How would you convert this to a categorical type of variable? Is this necessary in pyspark, because in pandas, I'm told that categorical data is much faster to process.

df = df.withColumn("BIRTHDAY", df['BIRTHDAY'].cast(DateType()))

That's how I'd do a string to a date.

user798719
  • 9,619
  • 25
  • 84
  • 123

0 Answers0