return default if pandas dataframe.loc location doesn't exist

Question

I find myself often having to check whether a column or row exists in a dataframe before trying to reference it. For example I end up adding a lot of code like:

if 'mycol' in df.columns and 'myindex' in df.index: x = df.loc[myindex, mycol]
else: x = mydefault

Is there any way to do this more nicely? For example on an arbitrary object I can do x = getattr(anobject, 'id', default) - is there anything similar to this in pandas? Really any way to achieve what I'm doing more gracefully?

score 59 · Accepted Answer · answered May 01 '14 at 08:49

59

There is a method for Series:

So you could do:

df.mycol.get(myIndex, NaN)

Example:

In [117]:

df = pd.DataFrame({'mycol':arange(5), 'dummy':arange(5)})
df
Out[117]:
   dummy  mycol
0      0      0
1      1      1
2      2      2
3      3      3
4      4      4

[5 rows x 2 columns]
In [118]:

print(df.mycol.get(2, NaN))
print(df.mycol.get(5, NaN))
2
nan

answered May 01 '14 at 08:49

EdChum

376,765
198
813
562

8

I was also able to get it to work when the index is known to exist: `df.loc['myindex'].get('mycol', NaN)` A shame that you still need to be sure that one of the index or column exists, but nonetheless this will be useful in a lot of scenarios. Thank you! – fantabolous May 01 '14 at 12:25

score 26 · Answer 2 · edited Oct 30 '19 at 03:41

26

Python has this mentality to ask for forgiveness instead of permission. You'll find a lot of posts on this matter, such as this one.

In Python catching exceptions is relatively inexpensive, so you're encouraged to use it. This is called the EAFP approach.

For example:

try:
    x = df.loc['myindex', 'mycol']
except KeyError:
    x = mydefault

edited Oct 30 '19 at 03:41

fantabolous

21,470
7
54
51

answered May 01 '14 at 08:10

FooBar

15,724
19
82
171

4

Perhaps I should use more EAFP, but my personal preference is to save try/excepts for when there's no other easy choice. Thanks though. – fantabolous May 01 '14 at 12:28
4

@Foobar: according to this [link](https://stackoverflow.com/questions/2522005/cost-of-exception-handlers-in-python) it is only the `try:` that is inexpensive. `except:` seems to be expensive. The moral of the story seems to be that the caller is left to decide between testing for existence or `try: except:`ing. The performance trade off depending on your use case. i.e. how long it takes to test existence vs how many times not testing will `raise`. Nevertheless, it would be nice if pandas offered syntactic sugar by permitting that choice to be argument driven. As far as I can tell, it does not. – OldSchool May 14 '20 at 16:13

Joe · Answer 3 · 2023-03-21T16:40:46.210

Use reindex:

df.reindex(index=['myindex'], columns=['mycol'], fill_value=mydefault)

What's great here is using lists for the index and columns, where some of them exist and some of them don't, and you get the fallback value whenever either the index or column is missing.

Example:

In[1]:
df = pd.DataFrame({ 
 'A':[1, 2, 3],
 'B':[5, 3, 7],
})
df

Out[1]:
    A   B
0   1   5
1   2   3
2   3   7

In[2]:
df.reindex(index=[0, 1, 100], columns=['A', 'C'], fill_value='FV')

Out[2]:
    A   C
0   1   FV
1   2   FV
100 FV  FV

return default if pandas dataframe.loc location doesn't exist

3 Answers3

Linked