1

Using this as a quick starting point;

http://pandas.pydata.org/pandas-docs/stable/reshaping.html

In [1]: df
Out[1]: 
         date variable     value
0  2000-01-03        A  0.469112
1  2000-01-04        A -0.282863
2  2000-01-05        A -1.509059
3  2000-01-03        B -1.135632
4  2000-01-04        B  1.212112
5  2000-01-05        B -0.173215
6  2000-01-03        C  0.119209
7  2000-01-04        C -1.044236
8  2000-01-05        C -0.861849
9  2000-01-03        D -2.104569
10 2000-01-04        D -0.494929
11 2000-01-05        D  1.071804

Then isolating 'A' gives this:

In [2]: df[df['variable'] == 'A']
Out[2]: 
        date variable     value
0 2000-01-03        A  0.469112
1 2000-01-04        A -0.282863
2 2000-01-05        A -1.509059

Now creating new dataframe would be:

dfA = df[df['variable'] == 'A'] 

Lets say B's would be:

dfB = df[df['variable'] == 'B'] 

So, Isolating the dataframes into dfA, dfB, dfC......

dfList  = list(set(df['variable']))
dfNames = ["df" + row for row in dfList]  

for i, row in enumerate(dfList):
    dfName = dfNames[i]
    dfNew = df[df['variable'] == row]
    dfNames[i] = dfNew      

It runs... But when try dfA I get output "dfA" is not defined

Merlin
  • 24,552
  • 41
  • 131
  • 206
  • You are writing `dfNew` into `dfNames[i]`, not `dfA`. It's roughly equivalent to the difference between `dfA` and `"dfA"`. I don't know if an exact solution to your question is possible in python due to the lack of macros. You maybe could do this with a context manager? But really, I would think about doing it another way. It might help if you could give some more context for the overall issue. – JohnE Aug 11 '15 at 03:18
  • @JohnE Thanks, Its much harder than it looks. I am trying to create everything dynamically, Segment out the smaller arrays so I can pickle them. In the above example, Simply trying to find a way to break out those four categories into separate df or array.. thanks for actually reading the code. – Merlin Aug 11 '15 at 11:32
  • @JohnE look at accepted ans. – Merlin Aug 11 '15 at 20:24
  • yep, that is thorough – JohnE Aug 11 '15 at 20:43

4 Answers4

5

Use groupby and get_group, eg:

grouped = df.groupby('variable')

Then when you want to do something with each group, access it as such:

my_group = grouped.get_group('A')

Gives you:

    date    variable    value
0   2000-01-03  A   0.469112
1   2000-01-04  A   -0.282863
2   2000-01-05  A   -1.509059
Jon Clements
  • 138,671
  • 33
  • 247
  • 280
  • does not really answer the question.. A list into multiple dataframes... Trying not to use groupby, trying to solve above question. – Merlin Aug 10 '15 at 19:26
  • Any particular reason you want to create dynamically named variables and perform a linear scan of the original dataframe as many times as the number of unique values of `variables` + 1? @Merlin? – Jon Clements Aug 10 '15 at 19:34
4

To answer your question literally, globals()['dfA'] = dfNew would define dfA in the global namespace:

for i, row in enumerate(dfList):
    dfName = dfNames[i]
    dfNew = df[df['variable'] == row]
    globals()[dfName] = dfNew   

However, there is never a good reason to define dynamically-named variables.

  • If the names are not known until runtime -- that is, if the names are truly dynamic -- then you you can't use the names in your code since your code has to be written before runtime. So what's the point of creating a variable named dfA if you can't refer to it in your code?

  • If, on the other hand, you know before hand that you will have a variable named dfA, then your code isn't really dynamic. You have static variable names. The only reason to use the loop is to cut down on boiler-plate code. However, even in this case, there is a better alternative. The solution is to use a dict (see below) or list1.

  • Adding dynamically-named variables pollutes the global namespace.

  • It does not generalize well. If you had 100 dynamically named variables, how would you access them? How would you loop over them?

  • To "manage" dynamically named variables you would need to keep a list of their names as strings: e.g. ['dfA', 'dfB', 'dfC',...] and then accessed the newly minted global variables via the globals() dict: e.g. globals()['dfA']. That is awkward.

So the conclusion programmers reach through bitter experience is that dynamically-named variables are somewhere between awkward and useless and it is much more pleasant, powerful, practical to store key/value pairs in a dict. The name of the variable becomes a key in the dict, and the value of the variable becomes the value associated with the key. So, instead of having a bare name dfA you would have a dict dfs and you would access the dfA DataFrame via dfs['dfA']:

dfs = dict()
for i, row in enumerate(dfList):
    dfName = dfNames[i]
    dfNew = df[df['variable'] == row]
    dfs[dfName] = dfNew   

or, as Jianxun Li shows,

dfs = {k: g for k, g in df.groupby('variable')}

This is why Jon Clements and Jianxun Li answered your question by showing alternatives to defining dynamically-named variables. It's because we all believe it is a terrible idea.


Using Jianxun Li's solution, to loop over a dict's key/value pairs you could then use:

dfs = {k: g for k, g in df.groupby('variable')}
for key, df in dfs.items():
    ...

or using Jon Clements' solution, to iterate through groups you could use:

grouped = df.groupby('variable')
for key, df in grouped:
    ...

1If the names are numbered or ordered you could use a list instead of a dict.

Community
  • 1
  • 1
unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
  • using Jianxun Li answer your modification to his ans. How would loop over the dict and isolate the the 'k' into their own dataframe... I am trying to generalize a solution so I dont know the keys ahead of time. I can drop the "df" in 'dfA' – Merlin Aug 11 '15 at 18:01
  • To loop over a dict, `dfs`, use `for key, values in dfs.items()` (in Python3), or `for key, values in dfs.iteritems()` (in Python2). I don't understand what "isolate the 'k' into their own dataframe" means. If your question is a clarification of the current question, please add it your question above. If it is a follow-up question, please consider asking it as a new question. – unutbu Aug 11 '15 at 18:13
  • Isolate each key into its own dataframe.. So, there are 4 keys, then there would be 4 dataframes. – Merlin Aug 11 '15 at 18:18
  • Then use the `for key, value in dfs.items()` loop as shown above. – unutbu Aug 11 '15 at 18:22
  • 3
    Regarding your suggested edit: My answer was really an attempt to convince you to not use `globals()`. Therefore I do not want to add `globals` there since I am NOT advocating the use of `globals()`. Once you have the `dict` as in Jianxun Li's answer, there is **no need for** using `globals()[key] = df`. Anywhere you would need `A`, you would use `dfs['A']` instead. – unutbu Aug 11 '15 at 20:38
  • Fine, But I cant create 'A' as a stand alone dataframe using dfs['A'] and that is what I need.. – Merlin Aug 11 '15 at 20:45
1

df.groupby('variable') returns an iterator with key/df pairs. So to get a list/dict of subgroups,

result = {k: g for k, g in df.groupby('variable')}

from pprint import pprint
pprint(result)

{'A':          date variable   value
0  2000-01-03        A  0.4691
1  2000-01-04        A -0.2829
2  2000-01-05        A -1.5091,
 'B':          date variable   value
3  2000-01-03        B -1.1356
4  2000-01-04        B  1.2121
5  2000-01-05        B -0.1732,
 'C':          date variable   value
6  2000-01-03        C  0.1192
7  2000-01-04        C -1.0442
8  2000-01-05        C -0.8618,
 'D':           date variable   value
9   2000-01-03        D -2.1046
10  2000-01-04        D -0.4949
11  2000-01-05        D  1.0718}


result['A']

         date variable   value
0  2000-01-03        A  0.4691
1  2000-01-04        A -0.2829
2  2000-01-05        A -1.5091
Jianxun Li
  • 24,004
  • 10
  • 58
  • 76
0
for i, row in enumerate(dfList):
    dfName = dfNames[i]
    dfNew = df[df['variable'] == row]
    vars()[dfNames[i]] = dfNew
  • The vars() function will extract the values of dfName and turns them to variables – khaoula kadri Mar 18 '20 at 15:59
  • please, give an explanation to your answer – ionpoint Mar 18 '20 at 16:03
  • The output of the code above is " 'dfA' is not defined " , that's because he is trying to assign a dataframe to a string (which is 'dfA' ). So to convert strings to variables we need to use the vars() function. In this case for example dfNames[i] = "dfA", so when we apply vars()[dfNames[i]] will return a variable dfA . – khaoula kadri Mar 18 '20 at 17:22