Lets assume that original data is like:
Competitor Region ProductA ProductB
Comp1 A £10 £15
Comp1 B £11 £16
Comp1 C £11 £15
Comp2 A £9 £16
Comp2 B £12 £14
Comp2 C £14 £17
Comp3 A £11 £16
Comp3 B £10 £15
Comp3 C £12 £15
I wish to get list of sub dataframes based on column values, say Region, like:
df_A :
Competitor Region ProductA ProductB
Comp1 A £10 £15
Comp2 A £9 £16
Comp3 A £11 £16
In Python I could do:
for region, df_region in df.groupby('Region'):
print(df_region)
Can I do same iteration if the df is Pyspark df?
In Pyspark, once I do df.groupBy("Region") I get GroupedData. I dont need any aggregation like count, mean, etc. I just need list of sub dataframes, each have same "Region" value. Possible?