0

I'm trying to find out whether a specific column exists in my DataFrame columns or not, but I have some problems.

What I do: Using boolean operation "not in" (I've tried any(), all(), "in") to find the specific column header and it seems it's not working properly!

let's say my DataFrame column headers are:

df.columns = ['El-array', 'a', 'b', 'm', 'n', 'Rho', 'dev', 'ip', 'sp', 'vp', 'i',
   'M1', 'M2', 'M3', 'M4', 'M5', 'M6', 'M7', 'M8', 'M9', 'M10', 'M11',
   'M12', 'M13', 'M14', 'M15', 'M16', 'M17', 'M18', 'M19', 'M20', 'TM1',
   'TM2', 'resist', 'DC_slope']

and I'm trying to see if all of 'M1', 'M2', ... 'M20' and 'TM1' are there. If one or more is missing code will not work.

So I say:

    if any(['M1','M2','M3','M4','M5','M6','M7','M8','M9','M10','M11',
        'M12','M13','M14','M15','M16','M17','M18','M19','M20', 'TM1']) not in df.columns: 
        print('Incomplete dataset')

Now, let's say df has all the asked columns headers, the if statement still shows the "Incomplete dataset' message!! I have tried "all() not in" too but same result!! I also have tried:

if 'M1' and 'M2' and ... and 'M20' and 'TM1' in df.columns:
    "Do this"
elif:
    print('Incomplete dataset')

or

if 'M1' or 'M2' or ... or 'M20' and 'TM1' not in df.columns:
    print('Incomplete dataset')
elif:
    "Do this"

Still prints incomplete dataset!!


Now for a truly incomplete dataset I get the same results too!!

Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153
Sina
  • 161
  • 1
  • 5
  • Just some advice on how python is working: `if any(['M1','M2',...'M19','M20', 'TM1']) not in df.columns` is really doing `if True not in df.columns` because `any(['M1','M2',...'M19','M20', 'TM1']) == True` – Dillon Jun 15 '18 at 15:47
  • This question is not about `pandas` at all; the fact that the list of strings comes from the column names is irrelevant to the problem. – Karl Knechtel Jul 06 '22 at 06:29

1 Answers1

1

You have a fundamental misunderstanding of how any and or work. I suggest going back and having a look at the documentation I've linked to on those.

You want:

names = ['M1','M2','M3','M4','M5','M6','M7','M8','M9','M10','M11',
        'M12','M13','M14','M15','M16','M17','M18','M19','M20', 'TM1']
if any(name not in df.columns for name in names):
    ...
else:
    print('incompatable dataset')

Alternatively (and this is really just for minimal performance gain), you can use the set difference (returns all values than are in names but not in df.columns):

if not set(names) - set(df.columns):
   ...
FHTMitchell
  • 11,793
  • 2
  • 35
  • 47