2

The constructs to work with include:

Filter with like or regex or in:

 series.filter(like='some pattern')

 series.filter(regex='some regex')

But those are positive not negative filters..

On a Dataframe we can do a not with a tilde as follows:

  df.filter(~('some pattern' in df['some_column']))

But that is not available on a Series. So what is the not filter on a Series?

WestCoastProjects
  • 58,982
  • 91
  • 316
  • 560
  • Do you want to filter out columns or rows? `filter` gets you columns, so your syntax seems wrong. – cs95 Apr 01 '18 at 23:42
  • 1
    It looks like you could just do `df[~df['some_column'].str.contains(pattern)]` if you're trying to filter rows. Same for `df.columns`, just use `df.loc[:, ~df.columns.str.contains(pattern)]` for filtering on columns. – cs95 Apr 01 '18 at 23:43
  • @cᴏʟᴅsᴘᴇᴇᴅ Your syntax seems to be for `dataframe` not `Series` : the latter do no include column specifiers – WestCoastProjects Apr 01 '18 at 23:45
  • Sorry about that, I misread. However, my first suggestion to filter on rows applies. `series[~series.str.contains(pattern)]` – cs95 Apr 01 '18 at 23:46
  • what is the `series.str` ? – WestCoastProjects Apr 01 '18 at 23:54
  • It's the accessor you use to run vectorised string functions on the series/dataframe columns. – cs95 Apr 01 '18 at 23:57
  • sorry, my code had become messy : now cleaned up it is back to a series -and your suggestion about `~series.str.contains` works well. please make it an answer – WestCoastProjects Apr 01 '18 at 23:58
  • btw are there other important objects in `Series` similar to `str` : i.e that permit accessing additional methods? For example `series.int` ? – WestCoastProjects Apr 01 '18 at 23:59
  • No, that's alright. I think your question is a duplicate of this: https://stackoverflow.com/q/28679930/4909087 You can delete or self-mark. – cs95 Apr 01 '18 at 23:59
  • Well, there is the `.dt` for dates and `.cat` for categorical columns. Other types do not have an accessor. – cs95 Apr 02 '18 at 00:00
  • @cᴏʟᴅsᴘᴇᴇᴅ mine is not a duplicate because it is about a *series* (i had already seen the dataframe questions..): and your answer is in fact also *new* material. – WestCoastProjects Apr 02 '18 at 00:00

1 Answers1

2

The idiomatic method for filtering series is to use str.contains and negate the result.

series = series[~series.str.contains(pattern)]

If your pattern is not regex (but rather a simple substring pattern), I'd suggest a list comprehension as a faster alternative:

series = pd.Series([pattern not in v for v in series])

s = pd.Series(['ABC123', 'ABCdef', 'hijk'])

s[~s.str.contains('ABC')] 
2    hijk
dtype: object
cs95
  • 379,657
  • 97
  • 704
  • 746
  • I'd like to point out this answer brings up an important (and previously unfamiliar to me) attribute **`.str`** of a `Series` - and which distinguishes it from a `DataFrame` – WestCoastProjects Apr 02 '18 at 00:02
  • @javadba Yup, also if you were to perform this operation on a dataframe column, you would still need the `.str` accessor since dataframe columns are nothing but Series :) – cs95 Apr 02 '18 at 00:03