Combine two columns values in one Spark- Python

Asked Jun 24 '19 at 16:06

Active Jun 24 '19 at 16:12

Viewed 133 times

I have this table bellow:

FrameForm | Sections | Framefrom_section | FrameFrom_echelon
----------|----------|-------------------|------------------
70        |  11/12   |       11/12       |      50004
70        |  13/14   |       13/14       |      60003

How can I do a test via pySpark on a FrameFrom column to combine the two values of Framefrom_section and FrameFrom_echelon to obtain this result:

FrameForm | Framefrom_section | FrameFrom_echelon
----------|-------------------|------------------
70        | [11/12,13/14]     |    [50004,60003]

edited Jun 24 '19 at 16:12

pault

41,343
15
107
149

asked Jun 24 '19 at 16:06

vero

1,005
6
16
29

2

`from pyspark.sql.functions import collect_list` and then do `new_df = df.groupBy("FrameForm").agg(*[collect_list(c).alias(c) for c in ["Framefrom_section", "FrameFrom_echelon"])` .. looking for a dupe – pault Jun 24 '19 at 16:12
6

Possible duplicate of [pyspark collect\_set or collect\_list with groupby](https://stackoverflow.com/questions/37580782/pyspark-collect-set-or-collect-list-with-groupby) – pault Jun 24 '19 at 16:12

Combine two columns values in one Spark- Python

0 Answers0