How to get the first and last element from column created from collect_list in Java Spark

Question

If I understand correctly using groupBy().agg(collect_list(column)) will get me a column of list. How do I get the first and last element from that list to create a new column (in Spark Dataset Java)?

For first, I can do something like this

.withColumn("firstItem", functions.col("list").getItem(0))

but how do I handle empty list?

For last item, I was thinking about size()-1, but in Java, -1 isn't supported in Spark data set, I tried:

withColumn("lastItem", function.col("list").getItem(functions.size(functions.col("list")).minus(1))

but it will complaint something about unsupported type error.

Using `groupBy` and `collect_list` will change the order of the items. It would be better to look into the window functions where you can use `orderBy` and the `first` and `last` methods. — Shaido, Feb 08 '18 at 06:55

score 2 · Accepted Answer · answered Feb 08 '18 at 11:52

To answer your questions:

but how do I handle empty list?

Just don't worry about it. Access to non existing index gives NULL (undefined) so there is no problem here.

If you want some default value use org.apache.spark.sql.functions.coalesce with org.apache.spark.sql.functions.lit.

For last item, I was thinking about size()-1, but in Java, -1 isn't supported

Use apply, not getItem:

import static org.apache.spark.sql.functions.*;

col("list").apply(size(col("list")).minus(lit(1)));

In practice:

Just use min, max functions. Don't replicate groupByKey in SQL.

Related:

How to select the first row of each group?

score 2 · Answer 2 · edited Sep 20 '19 at 10:27

2

An empty list will simply return null instead of any error. Do this for the last item.

import org.apache.spark.sql.functions._
withColumn("lastItem", reverse(col("list")).getItem(0))

edited Sep 20 '19 at 10:27

Pavindu

2,684
6
44
77

answered Sep 20 '19 at 10:19

Rahul Patil

21
2

How to get the first and last element from column created from collect_list in Java Spark

2 Answers2