0

I feel like I'm just doing something dumb, but I'm having trouble getting a particular filter / subselect to work in pandas.

So I've got a dataframe that I'm trying to clean. It contains a field called 'category' and I want to filter out records containing the substring 'BTC' in the 'category field.

When I try

df['BTC' not in df['category']]

I get a 'KeyError: False'. I think I have a basic misunderstanding of how filtering / subselects work in pandas, because I thought it was based on describing a boolean condition to select the data. For instance, I can do

df[df['category'] == 'something']

which evaluates to True for some subset of the rows and returns them. 'BTC' not in df['category'] appears to also be a boolean expression, but pandas doesn't seem to like it. What am I missing here? I would love a little bit of background to clear up what I feel like are some misconceptions I have on how this works, despite being a daily pandas user.

I've looked through other filtering questions here and on other forums and can't quite seem to find something that fits this situation. I feel like there must be though, so if there is something else out there I missed please point me there and apologies for the duplicate. Thanks for your help

zyd
  • 833
  • 7
  • 16
  • well you can filter out columns one which you wanna select `df[df['category'] == 'something']` this way and you can put not sign `df[~df['category'] == 'something']` '~' this sign works – id101112 Oct 27 '18 at 20:47
  • yeah I am aware of that, it just doesn't seem to work when looking for a substring with the *in* operator. but the duplicate that got marked for this was exactly what I was looking for. thanks! – zyd Oct 27 '18 at 21:50

0 Answers0