creating multiple dataframes which contain subsets of an orginal pandas dataframe

Question

I have a dataframe (called df) and want to split it into multiple dataframes based on the values in one of the columns.

I think the syntax would be:

month = df.month.unique().tolist()
for item in month:
    [item] = df[df[month]==[item]]

unutbu · Accepted Answer · 2014-12-02T22:21:42.263

0

The DataFrameGroupBy object is an iterator that returns sub-DataFrames:

for month, subdf in df.groupby(['month']):
    ...

If instead you want random access to the sub-DataFrames, specified by month, you could change the month column into an index:

df = df.set_index(['month'])

and then you could select rows by month with:

df.loc[month]

For example,

In [4]: df = pd.DataFrame({'month': ['Jan','Jan','Feb'], 'val':[1,2,3]})        
In [6]: df = df.set_index(['month'])

Given this DataFrame:

In [7]: df
Out[7]: 
       val
month     
Jan      1
Jan      2
Feb      3

This selects the rows where the month (index) is 'Jan':

In [8]: df.loc['Jan']
Out[8]: 
       val
month     
Jan      1
Jan      2

edited Dec 02 '14 at 22:21

answered Dec 02 '14 at 20:30

unutbu

842,883
184
1,785
1,677

unutbu - thanks! How does one refer to the individual dataframes now generated? I.e. January, February, March, etc? – yoshiserry Dec 02 '14 at 21:37
In you want random access to the sub-dataframes selected by month values, then I think it would be best to make `month` the index. Then you could select the sub-dataframe with `df.loc[month]`. I've edited the post above to show what I mean. – unutbu Dec 02 '14 at 21:54
unutbu, so df.loc[month], where month is the dataframe name (and a value from the month_date column) is how you would show just the january dataframe? – yoshiserry Dec 02 '14 at 22:15
@yoshiserry: Yes, exactly. – unutbu Dec 02 '14 at 22:22
unutbu -- Thank you! so .loc is for selecting rows, and .ix can be used for selecting rows, or columns or both? – yoshiserry Dec 02 '14 at 22:42
ubutbu -- Thank you! Can you please clarify. As I understand .loc is for selecting rows, and .ix can be used for selecting rows, columns or both rows and columns. I think .loc doesn't create a new dataframe, it just lets you select a part of an original one (like just the January part). But there is still only one dataframe. Does your Groupby code actually create separate dataframes as compared to selecting just part of an original like .loc does? how does one reference the actual dataframe for just january data? (which is separate to the dataframe (df). – yoshiserry Dec 02 '14 at 22:53
In general, because arbitrary rows get selected, both `groupby` and `df.loc` return new DataFrames. `df.loc` can select rows and columns based on values. `df.iloc` can select rows and columns based on ordinal location. Now that Pandas has `loc` and `iloc` I don't recommend ever using `ix`, since its behavior is not immediately apparent from the syntax. – unutbu Dec 03 '14 at 00:37
ok thanks unutbu - i'll use iloc and loc from now on for rows and columns. So given that loc and groupby return new dataframes how does one access the January dataframe is it still just df.loc[January] as before in your comment? – yoshiserry Dec 03 '14 at 01:10
@unutbu, can you look at ? Its a similar question....thanks http://stackoverflow.com/questions/31927309/python-pandas-create-multiple-dataframes-from-list – Merlin Aug 11 '15 at 15:45

creating multiple dataframes which contain subsets of an orginal pandas dataframe

1 Answers1