228

I am kind of getting stuck on extracting value of one variable conditioning on another variable. For example, the following dataframe:

A  B
p1 1
p1 2
p3 3
p2 4

How can I get the value of A when B=3? Every time when I extracted the value of A, I got an object, not a string.

Mykola Zotko
  • 15,583
  • 3
  • 71
  • 73
Gejun
  • 4,012
  • 4
  • 17
  • 22
  • 1
    I see, I should add `item()` at the end. – Gejun Apr 18 '16 at 01:23
  • `df.query` and `pd.eval` seem like good fits for this use case. For information on the `pd.eval()` family of functions, their features and use cases, please visit [Dynamic Expression Evaluation in pandas using pd.eval()](https://stackoverflow.com/questions/53779986/dynamic-expression-evaluation-in-pandas-using-pd-eval). – cs95 Dec 16 '18 at 04:58

7 Answers7

375

You could use loc to get series which satisfying your condition and then iloc to get first element:

In [2]: df
Out[2]:
    A  B
0  p1  1
1  p1  2
2  p3  3
3  p2  4

In [3]: df.loc[df['B'] == 3, 'A']
Out[3]:
2    p3
Name: A, dtype: object

In [4]: df.loc[df['B'] == 3, 'A'].iloc[0]
Out[4]: 'p3'
Anton Protopopov
  • 30,354
  • 12
  • 88
  • 93
  • 42
    Thanks for your help. `df.loc[df['B'] == 3, 'A'].item()` works for me too. – Gejun Apr 18 '16 at 18:08
  • 10
    which one does it pick if the data frame has multiple entries of '3' in column B? – bananagator Oct 28 '18 at 21:56
  • 1
    df.loc[df["Condition_Column"] == 0, ("Column_1", "Column_2, "Column_3", "Column_4")] works for me. This example for select multiple columns. They should be in tuple. – Mustafa Uçar Dec 12 '18 at 07:30
  • if the df.loc[df['B'] == 300, 'A'].iloc[0] means what will be the output? – user1999109 Mar 02 '19 at 05:54
  • 1
    .item() apparently has been deprecated and will be removed. Is there another way to do this? I'm not interested in the name of the column or datatype which is also returned with the .loc method to query. – Dan Aug 14 '20 at 12:46
  • Another alternative to the .iloc[0] part at the end is .iat[0] which also just returns the first element but has slightly less overhead than .iloc. – Alex Nov 24 '20 at 04:57
  • @user1999109 if the condition doesn't have any matches (in your example, 300 is never found in column B), then the part before .iloc will return an empty series. If you then try to retrieve a value at a non-existent index (index 0), it returns an IndexError. – Alex Nov 24 '20 at 05:00
  • @andersonzhu What if there are two columns (example column C with similar values as B0? `df.loc[df['B'] == 3 && df['C'] == 6, 'A'].item()` doesn't work – Sulphur May 10 '21 at 01:06
  • 1
    @Sulphur you need to wrap the sub-statements with `()`. `df.loc[(df['B'] == 3) & (df['C'] == 6), 'A'].item()` should work. – Gejun May 11 '21 at 03:33
  • I wonder if this creates one intermediate `DataFrame` when it evaluates the expression `df['B'] == 3` or if it evaluates `loc` lazily. – plop Aug 11 '21 at 20:20
  • Great answer! I was able to extend this in some code I'm writing to `df = df[df.column1 != df.loc[df['column2'] == 0, 'column1'].iloc[0]]`, which for me, finds a bad item in column2, and removes all rows that have a match in column1. – atc_ceedee Feb 02 '22 at 20:27
  • When I'm using the suggested code df.loc[df['B'] == 3, 'A'].iloc[0], I'm only getting one value, since we are doing .iloc[0]. However, in my dataframe, I have multiple records for the same condition. How can I get all the values returned? – PKV Mar 31 '22 at 22:24
  • Can this be used in a .fillna() function? df.loc[df['B'] == 3, 'A'].iloc[0] – ssuhas76 Dec 13 '22 at 20:02
  • @ssuhas76 sure, in case `df.loc[df['B'] == 3, 'A'].iloc[0] is a hashable value (float, int, string, set) it can be used for `fillna`. – Anton Protopopov Dec 15 '22 at 15:50
86

You can try query, which is less typing:

df.query('B==3')['A']
Scarabee
  • 5,437
  • 5
  • 29
  • 55
PhilChang
  • 2,591
  • 1
  • 16
  • 18
  • Query is interesting cause we can add more complex clauses to it as well – Samir Baid Jul 09 '18 at 20:00
  • 3
    IMHO, this is the best answer. – NLR Apr 26 '19 at 15:51
  • 1
    @NLR It depends. `query` creates a new `DataFrame`, and then the slice `['A']` creates yet another `DataFrame`. For large cases one would like to avoid the unnecessary intermediate object. – plop Aug 11 '21 at 20:12
57

Try:

df[df['B']==3]['A'].item()

assuming df is your pandas.DataFrame.

mirekphd
  • 4,799
  • 3
  • 38
  • 59
emited
  • 571
  • 3
  • 2
23

Use df[df['B']==3]['A'].values[0] if you just want item itself without the brackets

Baz
  • 528
  • 3
  • 9
14

Edited: What I described below under Previous is chained indexing and may not work in some situations. The best practice is to use loc, but the concept is the same:

df.loc[row, col]

row and col can be specified directly (e.g., 'A' or ['A', 'B']) or with a mask (e.g. df['B'] == 3). Using the example below:

df.loc[df['B'] == 3, 'A']

Previous: It's easier for me to think in these terms, but borrowing from other answers. The value you want is located in a dataframe:

df[*column*][*row*]

where column and row point to the values you want returned. For your example, column is 'A' and for row you use a mask:

df['B'] == 3

To get the first matched value from the series there are several options:

df['A'][df['B'] == 3].values[0]
df['A'][df['B'] == 3].iloc[0]
df['A'][df['B'] == 3].to_numpy()[0]
drT
  • 156
  • 1
  • 5
3

You can use squeeze instead of iloc[0]. It looks clearer if you have only one value:

df.loc[df['B'] == 3, 'A'].squeeze()

Output:

'p3'
Mykola Zotko
  • 15,583
  • 3
  • 71
  • 73
1
df.loc[df['B']=='give-specific-value', 'A']````

I have also worked on this clausing and extraction operations for my assignment.