2

So this link (Deleting DataFrame row in Pandas based on column value) describes how to do a very simple column deletion based on a value being present...

...This line will delete rows where the column line_race contains the value 0:

df = df[df.line_race != 0]

My question is a little more in depth though.

I would like to delete all the rows in my DataFrame where the value in the first column is NOT a certain value. Specifically, if the first column fish_frame[0] contains a string that doesn't match a value from another list stocks, then delete it.

I thought something like this would do the trick, but it didn't:

fish_frame = fish_frame[fish_frame[0] == (any s in fish_frame[0] for s in stocks)]

The first column of my DataFrame, fish_frame[0]:

0        NEFS X Available stocks
1                      11/6/2013
2                     Chris King
3            Package $4000 or BO
4            GOM Winter Flounder
5                        HAD GBW
6                        POLLOCK
7                  WINTER FL SNE
8                        COD GBW
9                      WHITEHAKE
10                       HAD GBE
11                WINTER FL GBbb
12                     Grey Sole
13                YELLOWTAIL SNE
14                 YELLOWTAIL GB
15                       REDFISH
16                           NaN
17                        List A
18         Package $10,000 or BO
19           GOM Winter Flounder

And my list stocks:

(
        'GB COD EAST',
        'GB COD EAST ACE',
        'GB COD WEST ACE',
        'GBE COD',
        'GB COD WEST',
        'GBW COD',
        'GOM COD',
        'GB HADDOCK EAST',
        'GBE HADDOCK',
        'GBW HADDOCK',
        'GBE HADD',
        'GB HADDOCK WEST',
        'GBYT',
        'GB YT',
        'GBW HADD',
        'GOM HADDOCK',
        'GOM HADD',
        'GOM HAD',
        'GOM HADOCK',
        'PLAICE',
        'DABS',
        'POLLOCK',
        'POLL',
        'REDFISH',
        'REDS',
        'RED',
        'WHITE HAKE' ,
        'WHITEHAKE',
        'WHAKE',
        'WHAK',
        'GB WINTER FLOUNDER',
        'GB WINTER FL',
        'GB BB',
        'GB WINTER',
        'GB BLACK BACKS',
        'GB BLACKBACKS',
        'GOM WINTER FLOUNDER',
        'WINTER GOM FLOUNDER',
        'GOM BLACKBACKS',
        'GOM BB',
        'GOM WINTER',
        'SNE WINTER FLOUNDER',
        'SNE WINTER',
        'SNE/MA WINTER FLOUNDER',
        'SNE/MA YELLOWTAIL',
        'SNE BLACKBACK',
        'SNE BLACKBACKS',
        'SNE BB',
        'WITCH FLOUNDER',
        'WITCH FL',
        'WITCH',
        'WHICH',
        'WHITCH',
        'GREYSOLE',
        'GREY SOLE',
        'CC/GOM YELLOWTAIL FLOUNDER',
        'GOM YELLOWTAIL',
        'GOM YELLOW TAIL',
        'GOM YT',
        'GB YELLOWTAIL FLOUNDER',
        'GB YELLOWTAIL',
        'GB YT',
        'SNE/MA YELLOWTAIL FLOUNDER',
        'SNE YT',
        'SNE YELLOWTAIL',
        'SNE YELLOW TAIL',
        'SCALLOP IFQ'
        )

Any help solving this would be appreciated, thanks.

theprowler
  • 3,138
  • 11
  • 28
  • 39

1 Answers1

2

You want to construct a regex pattern from the stocks object. Then use pd.Series.str.contains to test if there is a match. Finally, use boolean indexing to remove those that don't have a match.

df[df[0].astype(str).str.contains('|'.join(stocks))]

6       POLLOCK
9     WHITEHAKE
15      REDFISH
Name: 0, dtype: object
piRSquared
  • 285,575
  • 57
  • 475
  • 624
  • Ohhh ok gotcha. That solves my problem, thanks a bunch. It also unfortunately creates another problem (for me to solve) because now I have to add all the possible variations in spelling of fish to my list. But thanks again for the help. – theprowler Aug 17 '17 at 19:35
  • Forgive me for not understanding regex very well but if I wanted to add an `.IGNORECASE` command in there as well, where I would put it? – theprowler Aug 18 '17 at 15:53
  • 1
    It's not regex, it's pandas. First import the regex module with `import re`. Then use the `re.IGNORECASE` in the `str.contatins` like this `df[df[0].astype(str).str.contains('|'.join(stocks), flags=re.IGNORECASE)]` – piRSquared Aug 18 '17 at 15:57
  • Ohh so is the construction of a regex pattern, like in your answer, different from USING regex? – theprowler Aug 18 '17 at 16:02
  • 1
    No, I meant your misunderstanding is not of regex, it's a misunderstanding of where to pass the regex flag via pandas api. – piRSquared Aug 18 '17 at 16:03
  • Ohhhh ok ok that's clearer. Now it's making more sense. That helps a lot thanks for the clarification. – theprowler Aug 18 '17 at 16:05
  • No problem, glad I could help. – piRSquared Aug 18 '17 at 16:06