0

I have a pySpark dataframe and want to make a several sub dataframes using groupBy operation. For example, I have a DF like

       subject  relation object 
DF =      s1       p       o1
          s2       p       o2
          s3       q       o3
          s4       q       o4

and want to have a sub dataframes with same relation names like

       subject  relation object 
DF1 =      s1       p       o1
           s2       p       o2
       subject  relation object 
DF2 =      s3       q       o3
           s4       q       o4

I would be appreciated if you can share your idea how to make sub dataframes using groupBy().

Thanks

youngtackpark
  • 1,475
  • 3
  • 12
  • 14

1 Answers1

0

You can groupy and the create a list like this

df_groupby = DF.groupby('relation')

df_list = []
for row in df_groupby.select('relation').distinct().sort('relation').collect(): 
    current_relation = row['relation']
    df_list.append(df_groupby.filter(df_groupby['relation'] == current_relation))
SchwarzeHuhn
  • 638
  • 5
  • 17