2

I have created a dictionary by grouping some raingauges by their code with this coding

dict_of_gauges = {k: v for k, v in PE_14.groupby('gauge_code')}

which gave me some entries like the ones shown below

 11800  261070705A  PAULISTA    PE 2014-08-21 17:10:00      0.2
 11801  261070705A  PAULISTA    PE 2014-08-21 17:20:00      0.0
 11802  261070705A  PAULISTA    PE 2014-08-21 17:30:00      0.2
 11803  261070705A  PAULISTA    PE 2014-08-21 17:40:00      0.0
 11804  261070705A  PAULISTA    PE 2014-08-21 18:00:00      0.0
[3966 rows x 5 columns],
 '261070704A':        gauge_code      city state            datetime  rain_mm
 11493  261070704A  PAULISTA    PE 2014-08-21 21:20:00      0.2
 11494  261070704A  PAULISTA    PE 2014-08-21 21:30:00      0.0
 11495  261070704A  PAULISTA    PE 2014-08-21 21:40:00      0.0
 11496  261070704A  PAULISTA    PE 2014-08-21 21:50:00      0.0
 11497  261070704A  PAULISTA    PE 2014-08-21 22:00:00      0.0
[4180 rows x 5 columns],

now i really want to create dataframes for each one of them, and assign names like "df1", "df2" etc... but i dont seem to know how to do that inside a FOR. The code i ended up using was

df1 = pd.DataFrame.from_records(dict_of_gauges['261070703A'])
df2 = pd.DataFrame.from_records(dict_of_gauges['261070705A'])
.
.
.

but its not very professional to do the same thing so many times, i don't know how to assign those names and the pseudocode that i tried to make (below) didnt really work as it was overwriting the "df" at every loop, as expected.

listdfs = ['df0','df1','df2','df3','df4','df5']
for df, gauge in zip(listdfs, dict_of_gauges):
    df = pd.DataFrame.from_records(dict_of_gauges[gauge]) 

Could someone please give me some light in this?

  • 2
    Its simpler and easier to leverage the existing dictionary structure you have and to just rename the keys in the dictionary to something relevant / easy to call upon. E.g. change '261070703A' to 'df1'. You'll have all the benefit of being able to update names programmatically and to keep the very useful dictionary structure. – katardin May 18 '20 at 19:37
  • Would this list comprehension help? One way or another you'll end up with a list of dataframes `dframes = pd.DataFrame.from_records(dict_of_gauges[key]) for key in dict_of_gauges.keys()` – Dagrooms May 18 '20 at 19:40
  • Does this answer your question? [How do I create a variable number of variables?](https://stackoverflow.com/questions/1373164/how-do-i-create-a-variable-number-of-variables) – G. Anderson May 18 '20 at 19:40

1 Answers1

1

Probably the best way to do this is to just use the result of groupby().

>>> gb = PE_14.groupby('gauge_code')
>>> df0 = gb.get_group("261070705A")     # Get a single group.
>>> list(gb.groups)
['261070705A', '261070704A', ...]

So if you need to loop through the different groups, do something like

>>> for key in gb.groups:
...     groupdf = gb.get_group(key)
...     # ... do something with the group data frame.

Creating repeated new variables within a for loop is a problem, but you could pretty easily make a dictionary of dataframes that makes them easy to track without having to recompute the groupby every time. In fact, that's what you've already done in the first line of code! Instead of using df1, just use dict_of_gauges['261070705A']. If that's too verbose and you don't care about the key, you can also put them in a list:

>>> gauge_dfs = [gb.get_group(key) for key in gb.groups]

On the other hand, dict_of_gauges['261070705A'] is more readable than gauge_dfs[0].

vanPelt2
  • 870
  • 6
  • 16