0

I have a large dataframe with hierarchical indexing (a simplistic/ format example provided in the code below). I would like to setup a loop/automated way of splitting the dataframe into subsets per unique index value, i.e. dfa, dfb, dfc etc. in the coded example below and store in a list.

I have tried the following but unfortunately to no success. Any help appreciated!

data = pd.Series(np.random.randn(9), index=[['a', 'a', 'a', 'b', 
'b', 'c', 'c', 'd', 'd'], [1, 2, 3, 1, 2, 1, 2, 2, 3]])

split = []
for value in data.index.unique():
    split.append(data[data.index == value])
Omri374
  • 2,555
  • 3
  • 26
  • 40

1 Answers1

1

I am not exactly sure if this is what you are looking for but have you checked groupby pandas function? The crucial part is that you can apply it across MultiIndex specifying which level of indexing (or what subset of levels) to group by. e.g.

split = {}
for value, split_group in data.groupby(level=0):
    split[value] = split_group
print(split)

as @jezrael points out a simpler way to do it is:

dict(tuple(df.groupby(level=0)))
sophros
  • 14,672
  • 11
  • 46
  • 75
  • apologies if not clear I would like to have separate dataframes splits such that dfa is a dataframe of all of the rows (3 rows) under index 'a' , and dfb is another dataframe of all the rows under index 'b' etc. – Novice Python charmer Jul 08 '19 at 09:35
  • So you have it - I have changed the level to 0 and this is precisely what you get in the `split_group`s. – sophros Jul 08 '19 at 09:37
  • How do I access the dataframes in the list? if I type 'a' for example nothing is returned. The intention was to use this list of dataframes to run regressions over. – Novice Python charmer Jul 08 '19 at 09:43
  • You can't use a list for that! I have changed the split to a dictionary which should give you what you need. – sophros Jul 08 '19 at 09:45
  • I'm still not able to access the data on the split dataframes, when I enter a to retrieve the first database it returns the error 'name 'a' is not defined' – Novice Python charmer Jul 08 '19 at 09:52
  • Read a bit more about [GroupBy](https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html) and define what you mean by "database". I have no idea what you are doing to get the error. – sophros Jul 08 '19 at 10:09