0

I have this table bellow:

FrameForm | Sections | Framefrom_section | FrameFrom_echelon
----------|----------|-------------------|------------------
70        |  11/12   |       11/12       |      50004
70        |  13/14   |       13/14       |      60003

How can I do a test via pySpark on a FrameFrom column to combine the two values of Framefrom_section and FrameFrom_echelon to obtain this result:

FrameForm | Framefrom_section | FrameFrom_echelon
----------|-------------------|------------------
70        | [11/12,13/14]     |    [50004,60003]
pault
  • 41,343
  • 15
  • 107
  • 149
vero
  • 1,005
  • 6
  • 16
  • 29
  • 2
    `from pyspark.sql.functions import collect_list` and then do `new_df = df.groupBy("FrameForm").agg(*[collect_list(c).alias(c) for c in ["Framefrom_section", "FrameFrom_echelon"])` .. looking for a dupe – pault Jun 24 '19 at 16:12
  • 6
    Possible duplicate of [pyspark collect\_set or collect\_list with groupby](https://stackoverflow.com/questions/37580782/pyspark-collect-set-or-collect-list-with-groupby) – pault Jun 24 '19 at 16:12

0 Answers0