My data is as shown below
Store ID Amount,...
1 1 10
1 2 20
2 1 10
3 4 50
I have to create separate directory for each store
Store 1/accounts
ID Amount
1 10
2 20
store 2/accounts directory:
ID Amount
1 10
For this purpose Can I use loops in Spark dataframe. It is working in local machine. Will it be a problem in cluster
while storecount<=50:
query ="SELECT * FROM Sales where Store={}".format(storecount)
DF =spark.sql(query)
DF.write.format("csv").save(path)
count = count +1