I want to remove consecutive duplicates in an array when using hive.
collect_list()
keeps all duplicates, while collect_set()
only keeps distinct entries. I kind of need something in the middle ground.
For example, from the below table:
id | number
==============
fk 4
fk 4
fk 2
4f 1
4f 8
4f 8
h9 7
h9 4
h9 7
I would like to get something like this:
id | aggregate
===========================
fk Array<int>(4,2)
4f Array<int>(1,8)
h9 Array<int>(7,4,7)