591

How do I check if a column exists in a Pandas DataFrame df?

   A   B    C
0  3  40  100
1  6  30  200

How would I check if the column "A" exists in the above DataFrame so that I can compute:

df['sum'] = df['A'] + df['C']

And if "A" doesn't exist:

df['sum'] = df['B'] + df['C']
cottontail
  • 10,268
  • 18
  • 50
  • 51
npires
  • 6,093
  • 2
  • 13
  • 9

5 Answers5

1167

This will work:

if 'A' in df:

But for clarity, I'd probably write it as:

if 'A' in df.columns:
chrisb
  • 49,833
  • 8
  • 70
  • 70
192

To check if one or more columns all exist, you can use set.issubset, as in:

if set(['A','C']).issubset(df.columns):
   df['sum'] = df['A'] + df['C']                

As @brianpck points out in a comment, set([]) can alternatively be constructed with curly braces,

if {'A', 'C'}.issubset(df.columns):

See this question for a discussion of the curly-braces syntax.

Or, you can use a generator comprehension, as in:

if all(item in df.columns for item in ['A','C']):
C8H10N4O2
  • 18,312
  • 8
  • 98
  • 134
19

Just to suggest another way without using if statements, you can use the get() method for DataFrames. For performing the sum based on the question:

df['sum'] = df.get('A', df['B']) + df['C']

The DataFrame get method has similar behavior as python dictionaries.

Gerges
  • 6,269
  • 2
  • 22
  • 44
  • `df.get("A") + df.get("B")` still gives you an error if those don't exist, just the more confusing `TypeError: unsupported operand type(s) for +: 'NoneType' and 'NoneType'` rather than the easier-to-debug `KeyError`. `.get()` should only be used if you're actually planning on using the default value, otherwise it just pushes the error away from the point of failure and makes the state contract more confusing to intuit. The whole point of Gerges' answer is to use the second parameter to `.get()` to specify a column you know will exist as a fallback, not to let a bunch of Nones crash the code. – ggorlen Nov 11 '21 at 00:23
  • This is nice because I can check "column exists and is not NaN" with `if pandas.notnull(df.get("sum"))`. – Noumenon Apr 24 '23 at 13:19
9

You can also call isin() on the columns to check if specific column(s) exist in it and call any() on the result to reduce it to a single boolean value1. For example, to check if a dataframe contains columns A or C, one could do:

if df.columns.isin(['A', 'C']).any():
    # do something

To check if a column name is not present, you can use the not operator in the if-clause:

if 'A' not in df:
    # do something

or along with the isin().any() call.

if not df.columns.isin(['A', 'C']).any():
    # do something

1: isin() call on the columns returns a boolean array whose values are True if it's either A or C and False otherwise. The truth value of an array is ambiguous, so any() call reduces it to a single True/False value.

cottontail
  • 10,268
  • 18
  • 50
  • 51
6

You can use the set's method issuperset:

set(df).issuperset(['A', 'B'])
# set(df.columns).issuperset(['A', 'B'])
Mykola Zotko
  • 15,583
  • 3
  • 71
  • 73