12

Is there a more sophisticated way to check if a dataframe df contains 2 columns named Column 1 and Column 2:

if numpy.all(map(lambda c: c in df.columns, ['Column 1', 'Columns 2'])):
    do_something()
Nicola
  • 621
  • 10
  • 22
Hendrik Wiese
  • 2,010
  • 3
  • 22
  • 49

4 Answers4

19

I know it's an old post...

From this answer:

if set(['Column 1', 'Column 2']).issubset(df.columns):
    do_something()

or little more elegant:

if {'Column 1', 'Column 2'}.issubset(df.columns):
    do_something()
nick
  • 1,090
  • 1
  • 11
  • 24
Perico
  • 191
  • 1
  • 3
  • 1
    This returns true if all columns exist in the df, even if the df contains other columns as well. Thanks! – Adrian Tofting Jul 11 '19 at 12:02
  • 1
    Yes, that was Mr. Wiese was looking for, check if column 1 and column 2 are in the dataframe and then do something. He doesn't care if column 3 is in the df – Perico Jul 22 '19 at 18:13
16

You can use Index.isin:

df = pd.DataFrame({'A':[1,2,3],
                   'B':[4,5,6],
                   'C':[7,8,9],
                   'D':[1,3,5],
                   'E':[5,3,6],
                   'F':[7,4,3]})

print (df)
   A  B  C  D  E  F
0  1  4  7  1  5  7
1  2  5  8  3  3  4
2  3  6  9  5  6  3

If need check at least one value use any

cols = ['A', 'B']
print (df.columns.isin(cols).any())
True

cols = ['W', 'B']
print (df.columns.isin(cols).any())
True

cols = ['W', 'Z']
print (df.columns.isin(cols).any())
False

If need check all values:

cols = ['A', 'B', 'C','D','E','F']
print (df.columns.isin(cols).all())
True

cols = ['W', 'Z']
print (df.columns.isin(cols).all())
False
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Well, thanks, but that doesn't look so much better from my point of view, unfortunately... I was hoping there'd be something syntactically more pleasing. – Hendrik Wiese Jun 28 '16 at 07:57
  • 1
    I add another solution, please check it. – jezrael Jun 28 '16 at 08:01
  • And this solution is more better, so I remove old one. – jezrael Jun 28 '16 at 08:04
  • 1
    That `isin` solution already looks way better. My favorite solution was like `cols in df.columns` but that wouldn't work because it can't distinguish between all and any. – Hendrik Wiese Jun 28 '16 at 08:04
  • You should use the `is_subset` solution if you want to check if both `'A'` and `'B'` are in the columns list. `all()` in that case will give you `False` since `isin` returns a boolean Series back. – cosmarc May 08 '19 at 07:51
4

The one issue with the given answer (and maybe it works for the OP) is that it tests to see if all of the dataframe's columns are in a given list - but not that all of the given list's items are in the dataframe columns.

My solution was:

test = all([ i in df.columns for i in ['A', 'B'] ])

Where test is a simple True or False

elPastor
  • 8,435
  • 11
  • 53
  • 81
0

Also to check the existence of a list items in a dataframe columns, and still using isin, you can do the following:

col_list = ['A', 'B'] 
pd.index(col_list).isin(df.columns).all()

As explained in the accepted answer, .all() is to check if all items in col_list are present in the columns, while .any() is to test the presence of any of them.

abdelgha4
  • 351
  • 1
  • 16