How to merge rows using SQL only?

Question

I can neither use pyspark or scala. I can only write SQL code. I have a table with 2 columns item id, name.

item_id, name
1        name1
1        name2
1        name3
2        name4
2        name5

I want to generate results with the names of an item_id concatenated.

item_id,    names
1           name1-name2-name3
2           name4-name5

How do I create such a table with Spark sql?

Does this answer your question? [pyspark collect\_set or collect\_list with groupby](https://stackoverflow.com/questions/37580782/pyspark-collect-set-or-collect-list-with-groupby) — yahoo, Oct 15 '20 at 07:40

score 2 · Accepted Answer · answered Oct 15 '20 at 20:27

The beauty of Spark SQL is that once you have a solution in any of the supported languages (Scala, Java, Python, R or SQL) you can somewhat figure out other variants.

The following SQL statement seems doing what you ask for:

SELECT item_id, array_join(collect_list(name), '-') as names 
FROM tableName
GROUP BY item_id

In spark-shell it gives the following result:

scala> sql("select item_id, array_join(collect_list(name), '-') as names from so group by item_id").show
+-------+-----------------+
|item_id|            names|
+-------+-----------------+
|      1|name1-name2-name3|
|      2|      name4-name5|
+-------+-----------------+

Fahmi · Answer 2 · 2020-10-15T06:08:39.840

1

You can try the below -

df.orderBy('names', ascending=False)
    .groupBy('item_id')
    .agg(
        array_join(
            collect_list('names'),
            delimiter='-',
        ).alias('names')
    )

edited Oct 15 '20 at 06:08

answered Oct 15 '20 at 05:58

Fahmi

37,315
5
22
31

I can only use SQL – raju Oct 15 '20 at 18:18

score 0 · Answer 3 · edited Oct 15 '20 at 11:43

0

You can use Spark data frame's groupBy and agg methods and concat_ws function:

df.groupBy($"item_id").agg(concat_ws("-", collect_list($"name")).alias("names")).show()

Group fields by item_id and aggregating each name field by concatenating them together.

edited Oct 15 '20 at 11:43

Adam

3,891
3
19
42

answered Oct 15 '20 at 08:52

david gupta

56
4

I can only use SQL – raju Oct 15 '20 at 18:19

How to merge rows using SQL only?

3 Answers3