0

Suppose I have a list of tuples with index values:

mapper= [(0,6),(9,13),(17,27)]

And I have a large master_df which I want to split into multiple dfs based on the tuple index values from the list above.

mapper[0][0] is the starting point and mapper[0][1] is the ending point. And i have a list of df names.

df_list= ['df_1','df_2,'df_3']

I have tried the following snippet below trying to populate multiple df based on index values from mapper

for x in range(len(df_list)):
    df_list[x] = master_df[mapper[x][0]:mapper[x][1]]

But it is not working out the way I am envisioning. The ideal solution for me would be three separate df splits the master_df based on tuple index value from the list.

Here is an example of what I am trying to accomplish:

master_df:
     Name    Role       Location
0    Gina    Assistance    NY
1    Jake    Officer       Brooklyn
2    Boyle   Detective     99
3    Scully  Assistance    NY
4    Diaz    Officer       Brooklyn
5    Hitchcock Detective     99
6    Amy    Assistance    NY
7    Terry    Officer       Brooklyn
8    Holt   Detective     99
9    Judy   Assistance    NY
10   Adrian Officer       Brooklyn

mapper = [(0,3),(3,6),(6,11)]
df_list = ['df_1','df_2','df_3']

Seeking outcome

df_1:
     Name    Role       Location
0    Gina    Assistance    NY
1    Jake    Officer       Brooklyn
2    Boyle   Detective     99

df_2:
     Name    Role       Location
3    Scully  Assistance    NY
4    Diaz    Officer       Brooklyn
5    Hitchcock Detective     99

df_3:
     Name    Role       Location
6    Amy    Assistance    NY
7    Terry    Officer       Brooklyn
8    Holt   Detective     99
9    Judy   Assistance    NY
10   Adrian Officer       Brooklyn

Any help/guidance is appreciated!

olive
  • 171
  • 1
  • 2
  • 12
  • 1
    When indexing dataframes, with index, you should use [loc](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.loc.html). You can try `df_list = [master_df.loc(axis=0)[map[x][0]:map[x][1]] for map in mapper]` – DOOM Feb 24 '20 at 18:02
  • Some sample input and output would help to make your question more clear, as written it;s confusing what you're actually trying to do. See: [mcve] – G. Anderson Feb 24 '20 at 18:51
  • @G.Anderson Thank you for the suggestion. Edited my question with some sample input, and output. – olive Feb 24 '20 at 19:41

1 Answers1

3

You can unpack the tuples with *, and pass them to a range function, then use iloc[] to get those indices:

df_list=[df.iloc[range(*i),:] for i in mapper]

[     Name        Role  Location
 0   Gina  Assistance        NY
 1   Jake     Officer  Brooklyn
 2  Boyle   Detective        99,
         Name        Role  Location
 3     Scully  Assistance        NY
 4       Diaz     Officer  Brooklyn
 5  Hitchcock   Detective        99,
      Name        Role  Location
 6      Amy  Assistance        NY
 7    Terry     Officer  Brooklyn
 8     Holt   Detective        99
 9     Judy  Assistance        NY
 10  Adrian     Officer  Brooklyn]

And if you want them assigned to the names, you will have to make it a dictionary (see How to create a variable number of variables)

df_dict=dict(zip(df_list,[df.iloc[range(*i),:] for i in mapper]))

{'df_1':     Name        Role  Location
 0   Gina  Assistance        NY
 1   Jake     Officer  Brooklyn
 2  Boyle   Detective        99,
 'df_2':         Name        Role  Location
 3     Scully  Assistance        NY
 4       Diaz     Officer  Brooklyn
 5  Hitchcock   Detective        99,
 'df_3':       Name        Role  Location
 6      Amy  Assistance        NY
 7    Terry     Officer  Brooklyn
 8     Holt   Detective        99
 9     Judy  Assistance        NY
 10  Adrian     Officer  Brooklyn}
G. Anderson
  • 5,815
  • 2
  • 14
  • 21