-1

I made a DataFrame in Pandas and made a DataFrame like below.

if (a['conversion'] == line.strip().any() \
    or (a['x'] == line.strip().any() \
    or (a['y'] == line.strip().any() \
    or (a['z'] == line.strip().any():
    generate(line)
    

so basically, I would like to check both keys x, y and z but some types does not have key 'y' or 'z'. So KeyError: 'y' was raised.

How can I only approach existing keys? or do I have to use try/except? or if I have to use .get(['y]) function, how can I implement it? (it did not really work)

JGPARK
  • 57
  • 6
  • 1
    Welcome to Stack Overflow! Please include a _small_ subset of your data as a __copyable__ piece of code that can be used for testing as well as your expected output for the __provided__ data. See [MRE - Minimal, Reproducible, Example](https://stackoverflow.com/help/minimal-reproducible-example), and [How to make good reproducible pandas examples](https://stackoverflow.com/q/20109391/15497888). – Henry Ecker May 23 '21 at 00:54

1 Answers1

0

You have two approaches:

  1. Check if a column exists (Look Before You Leap):

    if df['conversion'] == line.strip().any() \
            or df['x'] == line.strip().any() \
            # Checking if the columns are part of the dataframe
            or ('y' in df.columns and df['y'] == line.strip().any()) \
            or ('z' in df.columns and df['z'] == line.strip().any()):
        generate(line)
    

    or

    columns = ('conversion', 'x', 'y', 'z')
    for column in columns:
        # If this column is not present, just keep looking
        if column not in df:
            continue
    
        if df[column] == line.strip().any():
            generate(line)
            break
    
  2. Try to access the damn column anyway (Easier to Ask for Forgiveness than Permission):

    condition = False
    columns = ('conversion', 'x', 'y', 'z')
    for column in columns:
        try:
            # Update the condition using the or clause
            condition = df[column] == line.strip().any()
        except KeyError:
            # If the key is not present, just keep looking
            continue
        else:
            # Use short-circuiting
            if condition:
                generate(line)
                break
    
Wai Ha Lee
  • 8,598
  • 83
  • 57
  • 92
enzo
  • 9,861
  • 3
  • 15
  • 38
  • One more question: even though we added 'y' in df.columns condition but we still use df['y'] == line.strip().any(). It seems like we still try to use df['y'] even though it does not exist. why does it not make any errors? – JGPARK May 23 '21 at 01:09
  • That's because of Python's [short-circuiting](https://stackoverflow.com/q/2580136/9997212), where the `and` operator only evaluates the second argument if the first one is true. So, if `y in df.columns` is true (i.e. a column exists), the second argument is evaluated, and since the column exists, it doesn't raise any errors. If `y in df.columns` is false (i.e. a column does not exist), the second argument will not be evaluated. If it would, it would raise a `KeyError`. That's the benefit of short-circuiting. I don't know if that's what your question meant, but feel free to correct me. – enzo May 23 '21 at 01:15