Select rows from dataframe which has multi index based on other dataframe columns

Question

I have 'univer' dataframe which has state and region name as columns,

    State   RegionName
0   Alabama Auburn
1   Alabama Florence
2   Alabama Jacksonville
3   Alabama Livingston

and 'statobot' dataframe has state and region name as index,

State   RegionName  2008Q3  2009Q2  ratio       
AK    Anchor Point  NaN NaN NaN
      Anchorage 296166.666667   271933.333333   1.089115
      Fairbanks 249966.666667   225833.333333   1.106863
      Homer NaN NaN NaN
      Juneau    305133.333333   282666.666667   1.079481
      Kenai NaN NaN NaN
      Ketchikan NaN NaN NaN
      Kodiak    NaN NaN NaN
      Lakes 257433.333333   257200.000000   1.000907
      North Pole    241833.333333   219366.666667   1.102416
      Palmer    259466.666667   263800.000000   0.983573

now I want to select rows in 'statobot' dataframe based on 'univer' dataframe, the state and region name must be exact fit, I have tried using

haveuni = statobot[(statobot.index.get_level_values(0).isin(univer['State'])) &(statobot.index.get_level_values(1).isin(univer['RegionName']))]

but the result rows are much more than I expected. Is there any other way to be more exact fit?

Can you add a sample of the dataframes you are talking about? Or a small example dataframes? — alejandro, Jun 18 '20 at 02:58
Hi, I tried to add one, but this is my first time asking a question, can you tell how I can add a dataframe? — Binke, Jun 18 '20 at 03:09
@Binke In your sample data frames, there are no matching data values for 'State' and 'RegionName'. I believe what you want is to use the 'State' and 'RegionName' in 'univer' and select the row having the same 'State' and 'RegionName' in 'statbot'? If you can provide a sample data set that is matching the logic you are explaining and the sample expected results, it will be great. Otherwise we are only playing a guessing game here. — davidbilla, Jun 18 '20 at 06:06
Please do not use code snippets if the code is not executable, also [this](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) can help when asking questions involving pandas dataframes. — vlizana, Jun 27 '20 at 03:08

davidbilla · Accepted Answer · 2020-06-18T07:21:22.973

The simplest one would be to merge the 2 data frames for intersection.

statbot = statbot.set_index(['State', 'RegionName'])
univer = univer.set_index(['State', 'RegionName'])

print(pd.merge(univer, statbot, left_index=True, right_index=True, how='inner'))

Here I have the same multi-index for the 2 dataframes. If you dont have them indices, you can specify the columns you want to merge on using the left_on and right_on parameters instead of left_index and right_index.

Otherwise there are some other ways using df.loc and cross-section - df.xs on multi-index

Loop through the 'univer' dataframe and find the rows matching the index - 'State' and 'RegionName' in 'statbot' dataframe

for i, row in univer.iterrows():
    print(statbot.loc[row['State'], row['RegionName']])

If you probably want the whole row without dropping the index fields then

for i, row in univer.iterrows():
    print(statbot.xs((row['State'], row['RegionName']), level=(0, 1), axis=0, drop_level=False))

Another method is use the slicing

statbot.reset_index(inplace=True)
for i, row in univer.iterrows():
    print(statbot[(statbot['State']==row['State']) & (statbot['RegionName']==row['RegionName'])])

I hope the pd.merge should work for your case. Let me know how it goes.

Select rows from dataframe which has multi index based on other dataframe columns

1 Answers1