0

This is a followup to

extract column value based on another column pandas dataframe

I have more than one row that matches the column value and want to know how to iterate to efficiently retrieve each value when there are multiple matches.

Dataframe is

A  B
p1 1
p1 2
p3 3
p2 4
p4 3
p5 5
p6 2
p7 5
... around 10000 rows

The below will always pick p3

df.loc[df['B'] == 3, 'A'].iloc[0]

So I tried to iterate like

 if(len(df.loc[df['B'] == 3, 'A'])) > 1:
                        for i in range(0,len(df.loc[df['B'] == 3, 'A']))-1):
                            print(i,df.loc[df['B'] == 3, 'A'].iloc[i])))

And it prints

0 p3
1 p4

for all matching values

However is there a more efficient way to do this?

SeaBean
  • 22,547
  • 3
  • 13
  • 25
user3423407
  • 341
  • 3
  • 13

2 Answers2

1

You can get all matching values by without using .iloc[0] in the df.loc code, as follows:

df.loc[df['B'] == 3, 'A']

Output:

2    p3
4    p4
Name: A, dtype: object

The 2 4 on the left of the output are the original row indices. You can use this information if want to know from which rows are these 2 extracted data originated from.

SeaBean
  • 22,547
  • 3
  • 13
  • 25
  • Thanks. Now I want to make this interactive, so accepting input from user. I try to compare that by iterating over the df for i, r in xdf.iteritems(): if inputvalue == r: if(len(df.loc[df['B'] == inputvalue, 'A'])) > 1: for i in range(0,len(df.loc[df['B'] == inputvalue, 'A']))-1): print(i,df.loc[df['B'] == inputvalue, 'A'].iloc[i]))) Output is 0 p3 0 p3 – user3423407 Aug 03 '21 at 14:03
  • @user3423407 Hi, you don't need to write the long loop to achieve the task. Just substitute your input value to `3` in the code above. That is, `df.loc[df['B'] == inputvalue, 'A']` – SeaBean Aug 03 '21 at 14:17
  • @user3423407 Your new looping codes doesn't work because you reused the variable `i` in 2 places with different usage. Thus, the second one is overwriting the first one. That's is, `for i in range(0,len(df.loc[df['B'] == inputvalue, 'A'])-1):` the `i` here is overwriting the outmost loop `i` in `for i, r in df.B.iteritems():`. – SeaBean Aug 03 '21 at 14:24
  • @user3423407 You need to change the last 2 lines to use another variable, says `j`: `for j in range(0,len(df.loc[df['B'] == inputvalue, 'A'])-1): print(i,df.loc[df['B'] == inputvalue, 'A'].iloc[j]) `. This will give you the expected result of `2 p3 4 p3` – SeaBean Aug 03 '21 at 14:26
  • @user3423407 Although we've fixed the bug of your looping codes, as I said earlier, this loop is not necessary. Just the code `df.loc[df['B'] == inputvalue, 'A']` is able to achieve the same result. Much simplified! – SeaBean Aug 03 '21 at 14:28
  • Understood. Thanks SeaBean – user3423407 Aug 03 '21 at 15:24
  • @user3423407 Welcome! Pleased to help! Happy coding :-) – SeaBean Aug 03 '21 at 15:28
0

Remove the iloc part and reset_index to really get your output:

df.loc[df['B'] == 3, 'A'].iloc[0]
                         ^^^^^^^^ Unnecessary
>>> df.loc[df['B'] == 3, 'A'].reset_index(drop=True)
0    p3
1    p4
Name: A, dtype: object
Corralien
  • 109,409
  • 8
  • 28
  • 52