56

I find myself often having to check whether a column or row exists in a dataframe before trying to reference it. For example I end up adding a lot of code like:

if 'mycol' in df.columns and 'myindex' in df.index: x = df.loc[myindex, mycol]
else: x = mydefault

Is there any way to do this more nicely? For example on an arbitrary object I can do x = getattr(anobject, 'id', default) - is there anything similar to this in pandas? Really any way to achieve what I'm doing more gracefully?

fantabolous
  • 21,470
  • 7
  • 54
  • 51

3 Answers3

59

There is a method for Series:

So you could do:

df.mycol.get(myIndex, NaN)

Example:

In [117]:

df = pd.DataFrame({'mycol':arange(5), 'dummy':arange(5)})
df
Out[117]:
   dummy  mycol
0      0      0
1      1      1
2      2      2
3      3      3
4      4      4

[5 rows x 2 columns]
In [118]:

print(df.mycol.get(2, NaN))
print(df.mycol.get(5, NaN))
2
nan
EdChum
  • 376,765
  • 198
  • 813
  • 562
  • 8
    I was also able to get it to work when the index is known to exist: `df.loc['myindex'].get('mycol', NaN)` A shame that you still need to be sure that one of the index or column exists, but nonetheless this will be useful in a lot of scenarios. Thank you! – fantabolous May 01 '14 at 12:25
26

Python has this mentality to ask for forgiveness instead of permission. You'll find a lot of posts on this matter, such as this one.

In Python catching exceptions is relatively inexpensive, so you're encouraged to use it. This is called the EAFP approach.

For example:

try:
    x = df.loc['myindex', 'mycol']
except KeyError:
    x = mydefault
fantabolous
  • 21,470
  • 7
  • 54
  • 51
FooBar
  • 15,724
  • 19
  • 82
  • 171
  • 4
    Perhaps I should use more EAFP, but my personal preference is to save try/excepts for when there's no other easy choice. Thanks though. – fantabolous May 01 '14 at 12:28
  • 4
    @Foobar: according to this [link](https://stackoverflow.com/questions/2522005/cost-of-exception-handlers-in-python) it is only the `try:` that is inexpensive. `except:` seems to be expensive. The moral of the story seems to be that the caller is left to decide between testing for existence or `try: except:`ing. The performance trade off depending on your use case. i.e. how long it takes to test existence vs how many times not testing will `raise`. Nevertheless, it would be nice if pandas offered syntactic sugar by permitting that choice to be argument driven. As far as I can tell, it does not. – OldSchool May 14 '20 at 16:13
0

Use reindex:

df.reindex(index=['myindex'], columns=['mycol'], fill_value=mydefault)

What's great here is using lists for the index and columns, where some of them exist and some of them don't, and you get the fallback value whenever either the index or column is missing.

Example:

In[1]:
df = pd.DataFrame({ 
 'A':[1, 2, 3],
 'B':[5, 3, 7],
})
df

Out[1]:
    A   B
0   1   5
1   2   3
2   3   7

In[2]:
df.reindex(index=[0, 1, 100], columns=['A', 'C'], fill_value='FV')

Out[2]:
    A   C
0   1   FV
1   2   FV
100 FV  FV
Joe
  • 2,994
  • 5
  • 31
  • 34