0

How to convert rows into string values in Apache Spark

I have a spark dataframe like this:

fruit | name
--------------
fruit | apple
fruit | orange
fruit | mango

I want to convert it into this:

fruit | string
----------------------------
fruit | apple, orange, mango

How can I achieve this in Apache Spark?

Kaptrain
  • 385
  • 1
  • 3
  • 10
  • look at `collect_list` – mtoto Nov 30 '16 at 09:53
  • 1
    Possible duplicate of [SPARK SQL replacement for mysql GROUP\_CONCAT aggregate function](http://stackoverflow.com/questions/31640729/spark-sql-replacement-for-mysql-group-concat-aggregate-function) – mtoto Nov 30 '16 at 12:30

1 Answers1

2

Look at collect_list

df.groupBy("fruit").agg(collect_list("name"))

it will group values and create array of them as a new column.

If you want to have string, please see this question (thanks @mtoto)

Community
  • 1
  • 1
T. Gawęda
  • 15,706
  • 4
  • 46
  • 61