Python Pandas Create Multiple dataframes from list

Question

Using this as a quick starting point;

http://pandas.pydata.org/pandas-docs/stable/reshaping.html

In [1]: df
Out[1]: 
         date variable     value
0  2000-01-03        A  0.469112
1  2000-01-04        A -0.282863
2  2000-01-05        A -1.509059
3  2000-01-03        B -1.135632
4  2000-01-04        B  1.212112
5  2000-01-05        B -0.173215
6  2000-01-03        C  0.119209
7  2000-01-04        C -1.044236
8  2000-01-05        C -0.861849
9  2000-01-03        D -2.104569
10 2000-01-04        D -0.494929
11 2000-01-05        D  1.071804

Then isolating 'A' gives this:

In [2]: df[df['variable'] == 'A']
Out[2]: 
        date variable     value
0 2000-01-03        A  0.469112
1 2000-01-04        A -0.282863
2 2000-01-05        A -1.509059

Now creating new dataframe would be:

dfA = df[df['variable'] == 'A']

Lets say B's would be:

dfB = df[df['variable'] == 'B']

So, Isolating the dataframes into dfA, dfB, dfC......

dfList  = list(set(df['variable']))
dfNames = ["df" + row for row in dfList]  

for i, row in enumerate(dfList):
    dfName = dfNames[i]
    dfNew = df[df['variable'] == row]
    dfNames[i] = dfNew

It runs... But when try dfA I get output "dfA" is not defined

You are writing `dfNew` into `dfNames[i]`, not `dfA`. It's roughly equivalent to the difference between `dfA` and `"dfA"`. I don't know if an exact solution to your question is possible in python due to the lack of macros. You maybe could do this with a context manager? But really, I would think about doing it another way. It might help if you could give some more context for the overall issue. — JohnE, Aug 11 '15 at 03:18
@JohnE Thanks, Its much harder than it looks. I am trying to create everything dynamically, Segment out the smaller arrays so I can pickle them. In the above example, Simply trying to find a way to break out those four categories into separate df or array.. thanks for actually reading the code. — Merlin, Aug 11 '15 at 11:32

score 5 · Answer 1 · answered Aug 10 '15 at 19:20

5

Use groupby and get_group, eg:

grouped = df.groupby('variable')

Then when you want to do something with each group, access it as such:

my_group = grouped.get_group('A')

Gives you:

    date    variable    value
0   2000-01-03  A   0.469112
1   2000-01-04  A   -0.282863
2   2000-01-05  A   -1.509059

answered Aug 10 '15 at 19:20

Jon Clements

138,671
33
247
280

does not really answer the question.. A list into multiple dataframes... Trying not to use groupby, trying to solve above question. – Merlin Aug 10 '15 at 19:26
Any particular reason you want to create dynamically named variables and perform a linear scan of the original dataframe as many times as the number of unique values of `variables` + 1? @Merlin? – Jon Clements Aug 10 '15 at 19:34

score 4 · Accepted Answer · edited May 23 '17 at 12:16

To answer your question literally, globals()['dfA'] = dfNew would define dfA in the global namespace:

for i, row in enumerate(dfList):
    dfName = dfNames[i]
    dfNew = df[df['variable'] == row]
    globals()[dfName] = dfNew

However, there is never a good reason to define dynamically-named variables.

If the names are not known until runtime -- that is, if the names are truly dynamic -- then you you can't use the names in your code since your code has to be written before runtime. So what's the point of creating a variable named dfA if you can't refer to it in your code?
If, on the other hand, you know before hand that you will have a variable named dfA, then your code isn't really dynamic. You have static variable names. The only reason to use the loop is to cut down on boiler-plate code. However, even in this case, there is a better alternative. The solution is to use a dict (see below) or list¹.
Adding dynamically-named variables pollutes the global namespace.
It does not generalize well. If you had 100 dynamically named variables, how would you access them? How would you loop over them?
To "manage" dynamically named variables you would need to keep a list of their names as strings: e.g. ['dfA', 'dfB', 'dfC',...] and then accessed the newly minted global variables via the globals() dict: e.g. globals()['dfA']. That is awkward.

So the conclusion programmers reach through bitter experience is that dynamically-named variables are somewhere between awkward and useless and it is much more pleasant, powerful, practical to store key/value pairs in a dict. The name of the variable becomes a key in the dict, and the value of the variable becomes the value associated with the key. So, instead of having a bare name dfA you would have a dict dfs and you would access the dfA DataFrame via dfs['dfA']:

dfs = dict()
for i, row in enumerate(dfList):
    dfName = dfNames[i]
    dfNew = df[df['variable'] == row]
    dfs[dfName] = dfNew

or, as Jianxun Li shows,

dfs = {k: g for k, g in df.groupby('variable')}

This is why Jon Clements and Jianxun Li answered your question by showing alternatives to defining dynamically-named variables. It's because we all believe it is a terrible idea.

Using Jianxun Li's solution, to loop over a dict's key/value pairs you could then use:

dfs = {k: g for k, g in df.groupby('variable')}
for key, df in dfs.items():
    ...

or using Jon Clements' solution, to iterate through groups you could use:

grouped = df.groupby('variable')
for key, df in grouped:
    ...

¹If the names are numbered or ordered you could use a list instead of a dict.

using Jianxun Li answer your modification to his ans. How would loop over the dict and isolate the the 'k' into their own dataframe... I am trying to generalize a solution so I dont know the keys ahead of time. I can drop the "df" in 'dfA' — Merlin, Aug 11 '15 at 18:01
To loop over a dict, `dfs`, use `for key, values in dfs.items()` (in Python3), or `for key, values in dfs.iteritems()` (in Python2). I don't understand what "isolate the 'k' into their own dataframe" means. If your question is a clarification of the current question, please add it your question above. If it is a follow-up question, please consider asking it as a new question. — unutbu, Aug 11 '15 at 18:13
Isolate each key into its own dataframe.. So, there are 4 keys, then there would be 4 dataframes. — Merlin, Aug 11 '15 at 18:18
Then use the `for key, value in dfs.items()` loop as shown above. — unutbu, Aug 11 '15 at 18:22
Regarding your suggested edit: My answer was really an attempt to convince you to not use `globals()`. Therefore I do not want to add `globals` there since I am NOT advocating the use of `globals()`. Once you have the `dict` as in Jianxun Li's answer, there is **no need for** using `globals()[key] = df`. Anywhere you would need `A`, you would use `dfs['A']` instead. — unutbu, Aug 11 '15 at 20:38
Fine, But I cant create 'A' as a stand alone dataframe using dfs['A'] and that is what I need.. — Merlin, Aug 11 '15 at 20:45

Jianxun Li · Answer 3 · 2015-08-10T20:44:20.147

df.groupby('variable') returns an iterator with key/df pairs. So to get a list/dict of subgroups,

result = {k: g for k, g in df.groupby('variable')}

from pprint import pprint
pprint(result)

{'A':          date variable   value
0  2000-01-03        A  0.4691
1  2000-01-04        A -0.2829
2  2000-01-05        A -1.5091,
 'B':          date variable   value
3  2000-01-03        B -1.1356
4  2000-01-04        B  1.2121
5  2000-01-05        B -0.1732,
 'C':          date variable   value
6  2000-01-03        C  0.1192
7  2000-01-04        C -1.0442
8  2000-01-05        C -0.8618,
 'D':           date variable   value
9   2000-01-03        D -2.1046
10  2000-01-04        D -0.4949
11  2000-01-05        D  1.0718}


result['A']

         date variable   value
0  2000-01-03        A  0.4691
1  2000-01-04        A -0.2829
2  2000-01-05        A -1.5091

khaoula kadri · Answer 4 · 2020-03-18T17:07:27.440

0

for i, row in enumerate(dfList):
    dfName = dfNames[i]
    dfNew = df[df['variable'] == row]
    vars()[dfNames[i]] = dfNew

edited Mar 18 '20 at 17:07

answered Mar 18 '20 at 15:55

khaoula kadri

1
2

The vars() function will extract the values of dfName and turns them to variables – khaoula kadri Mar 18 '20 at 15:59
please, give an explanation to your answer – ionpoint Mar 18 '20 at 16:03
The output of the code above is " 'dfA' is not defined " , that's because he is trying to assign a dataframe to a string (which is 'dfA' ). So to convert strings to variables we need to use the vars() function. In this case for example dfNames[i] = "dfA", so when we apply vars()[dfNames[i]] will return a variable dfA . – khaoula kadri Mar 18 '20 at 17:22

Python Pandas Create Multiple dataframes from list

4 Answers4

Linked