2

Note: my question isn't this one, but something a little more subtle.

Say I have a dataframe that looks like this

df = 
    A     B    C
0   3     3    1
1   2     1    9

df[["A", "B", "D"]] will raise a KeyError.

Is there a python pandas way to let df[["A", "B", "D"]] == df[["A", "B"]]? (Ie: just select the columns that exist.)

One solution might be

good_columns = list(set(df.columns).intersection(["A", "B", "D"]))
mydf = df[good_columns]

But this has two problems:

  • It's clunky and inelegant.
  • The ordering of mydf.columns could be ["A", "B"] or ["B", "A"].
Community
  • 1
  • 1
hlin117
  • 20,764
  • 31
  • 72
  • 93

2 Answers2

2

You can use filter, this will just ignore any extra keys:

df.filter(["A","B","D"])
    A     B  
0   3     3   
1   2     1   
hlin117
  • 20,764
  • 31
  • 72
  • 93
maxymoo
  • 35,286
  • 11
  • 92
  • 119
  • Thank you. I wish the pandas documentation had usage examples for each function, just like scikit-learn. – hlin117 Nov 30 '15 at 06:23
  • 1
    why don't you consider submitting some yourself, documentation is a great way to get started with contributing to open-source projects – maxymoo Nov 30 '15 at 22:29
1

You can use a conditional list comprehension:

target_cols = ['A', 'B', 'D']
>>> df[[c for c in target_cols if c in df]]
   A  B
0  3  3
1  2  1
Alexander
  • 105,104
  • 32
  • 201
  • 196
  • Looks like that's an `O(n)` check to see if `c in df`. I'll stick with @maxymoo's answer. Thanks! – hlin117 Nov 30 '15 at 06:02