I have a df which looks like this :
CustomerID | CustomerName | StoreName |
---|---|---|
101 | Mike | ABC |
102 | Sarah | ABC |
103 | Alice | ABC |
104 | Michael | PQR |
105 | Abhi | PQR |
106 | Bill | XYZ |
107 | Roody | XYZ |
Now I want to seperate out the 3 stores in 3 seperate dfs. For this i created a list of store names
store_list = df.select("StoreName").distinct().rdd.flatMap(lambda x:x).collect()
Now I want to iterate through this list and filter out different stores in diff dfs.
for i in store_list:
df_{i} = df.where(col("storeName") == i)
The code has syntax errors obviously, but thats the approach I am thinking. I want to avoid Pandas as the datasets are huge.
Can anyone help me with this?
Thanks