0

I have a while looping using a counter to generate separate dataframes based on a variable in a column.

As this is in a while loop the dataframe is reassigned to the same name which overwrites what was there where as I want to keep the separate dataframes. I'm trying to find away to generate the dataframe names using the counter for the while loop

I have done a cluster analysis which has n clusters. The data has a column names 'Clusters' which has the cluster number that variable is assigned to. I used

numberofClusters = max(dataframe['Clusters'])

to get my number of clusters and my working while loop is as follows:

while ClusterNo < maxClusters:
Cluster = Clusters[Clusters['Cluster'] == ClusterNo]
ClusterNo += 1

Which will obviously keep overwriting 'Cluster' for the 10 iterations. But I want to keep all the separate dataframes. to that I have been using

while ClusterNo < maxClusters:
'Cluster' + str(ClusterNo)  = Clusters[Clusters['Cluster'] == ClusterNo]
ClusterNo += 1

To try and generate a different dataframe for each cluster. But this is throwing a syntax error.

My expected output is for n amount of dataframes to be selected from the raw imported data each assigned to a different name

Callum Smyth
  • 127
  • 2
  • 7
  • 1
    Why not `cluster_dict = {}` and then `cluster_dict[ClusterNo] = Clusters[Clusters['Cluster'] == ClusterNo]`? – CJR Jan 08 '19 at 15:09
  • @CJ59 Thanks for replying. Can you please explain how I can access each dataframe on its own from the dictionary? For instance I want to use this to match the clusters with some new raw data downstream. I've just moved over from using R so having encountered dictionaries before so I'm kind of at a loss with them – Callum Smyth Jan 08 '19 at 15:19
  • I had tried `dictofCluster = {k: v for k, v in Clusters.groupby('Cluster')}` which gives the same output as suggested. But once again I don't know what to do with the dicts once they're generated – Callum Smyth Jan 08 '19 at 15:26
  • The dictionary contains references to all the dataframes that you generate. So `cluster_dict[0]` would be the DataFrame corresponding to `ClusterNo = 0` and so on and so forth. – CJR Jan 08 '19 at 15:37

0 Answers0