I have a column which has values like 'Jan 2018', 'Mar 2019', 'Dec 2016'. I want to convert this to date type(MMM yyyy). When I do it using pyspark, the dataframe result includes the date also- like (2018,1,1). How to get rid of the date?
from pyspark.sql import SparkSession
from pyspark import SparkContext, SparkConf
from pyspark.sql.functions import to_date
conf = SparkConf().setMaster("local").setAppName("Date")
sc=SparkContext(conf=conf)
spark=SparkSession(sc)
df = spark.createDataFrame([('Jan 2018',)], ['Month_Year'])
df1 = df.select(to_date(df.Month_Year, 'MMM yyyy').alias('dt')).collect()
print(df1)
Output: dt=datetime.date(2018,1,1)
My expected output is (2018,1) or (Jan 2018) or (1,2018) i.e. only month and year