Pyspark - How do you split a column with Struct Values of type Datetime?

Question

I have the following code which creates windows and aggregates values in the windows.

df.groupBy(window("time", "30 minutes"))\
  .agg(func.countDistinct("customer_numbers")

The window column (column that holds the time periods) is now a struct with two datetimes.

[datetime1, datetime2].

My dataframe looks like this :

window                                       customer_numbers
[2018-02-04:10:00:00, 2018-02-04:10:30:00]          10
[2018-02-04:10:30:00, 2018-02-04:11:00:00]          15

I want it to look like this

start                             End               customer_numbers
2018-02-04:10:00:00        2018-02-04:10:30:00        10
2018-02-04:10:30:00        2018-02-04:11:00:00        15

I want to split this out into two columns and get rid of the original window column but i cant seem to find a way to do this. I tried using UDFs though i think in scala you can simply do something like to get the first item i do not know how to do this in pyspark. I tried adding a UDF but instead of it giving me the first value it gave me a calendar.

.withColumn("key", $"window"._1)

Does anyone know how i can achieve this?

Possible duplicate of [Querying Spark SQL DataFrame with complex types](https://stackoverflow.com/questions/28332494/querying-spark-sql-dataframe-with-complex-types) — philantrovert, Jun 07 '18 at 07:48

score 3 · Accepted Answer · answered Jun 07 '18 at 05:49

3

You can simply use select function as

.select(func.col('window')[0].alias('start'), func.col('window')[1].alias('end'), func.col('customer_numbers')).drop('window')

answered Jun 07 '18 at 05:49

Ramesh Maharjan

41,071
6
69
97

1

Thanks that worked, though i had to explicitly name the struct types instead of using [0] and [1] – SecretAgent Jun 07 '18 at 09:12

Pyspark - How do you split a column with Struct Values of type Datetime?

1 Answers1