2

I have a pyspark dataframe

id events
a0 a-markets-l1
a0 a-markets-watch
a0 a-markets-buy
c7 a-markets-z2
c7 scroll_down
a0 a-markets-sell
b2 next_screen

I am trying to join events by grouping IDs Here's my python code

df_events_userpath = df_events.groupby('id').agg({ 'events': lambda x: ' '.join(x)}).reset_index()

id events
a0 a-markets-l1 a-markets-watch a-markets-buy a-markets-sell
c7 a-markets-z2 scroll_down
b2 next_screen
Shubh
  • 585
  • 9
  • 29
  • 2
    Does this answer your question? [Concatenating string by rows in pyspark](https://stackoverflow.com/questions/41788919/concatenating-string-by-rows-in-pyspark) – ScootCork May 15 '22 at 16:50

1 Answers1

0

I have tried using collect_set

df.groupBy("id").agg(f.collect_set("events").alias("events"))

Shubh
  • 585
  • 9
  • 29