How to find a string match in df col based on list of strings?

Question

I have a list of 1000 corporate companies and a df of all previous transactions for the year. For every match, I would like to create a new row value (True) in the new column (df$Covered).

I am not sure why I keep getting the errors below. I tried researching these questions but no luck so far.

Match string to list of defined strings

Pandas extract rows from df where df['col'] values match df2['col'] values

Code Example: when I set regex=False

Customer_List = ['3M','Cargill,'Chili's,---]

df['Covered'] = df[df['End Customer Name'].str.contains('|'.join(Customer_List),case=False, na=False, regex=False)]

ValueError: Wrong number of items passed 32, placement implies 1

Code Example: when I set regex=True

error: bad character range H-D at position 177825

 ~/opt/anaconda3/lib/python3.7/sre_parse.py in parse(str, flags, pattern)
    928 
    929     try:
--> 930         p = _parse_sub(source, pattern, flags & SRE_FLAG_VERBOSE, 0)
    931     except Verbose:
    932         **# the VERBOSE flag was switched on inside the pattern.  to be**

~/opt/anaconda3/lib/python3.7/sre_parse.py in _parse_sub(source, state, verbose, nested)
    424     while True:
    425         itemsappend(_parse(source, state, verbose, nested + 1,
--> 426                            **not nested and not items**))
    427         if not sourcematch("|"):
    428             break

possible to post the O/P of df.sample().to_dict() - that will help to recreate/test the problem. — instinct246, Feb 24 '20 at 17:23
df['End Customer Name'] are 100k+ rows of names while Customer_List is a list of 1000 company names, does that help? — pandas, Feb 24 '20 at 17:24
Why are saying 'regex=False'? You are creating a regular expression by joining your terms with the 'bar' symbol meaning OR in regex. — Scott Boston, Feb 24 '20 at 17:24
Thanks Scott, I didn't know if I needed a literal string or Regex. Do you think it has to do with having a special character? — pandas, Feb 24 '20 at 17:32
@pandas _Do you think it has to do with having a special character?_ What do you mean? — AMC, Feb 24 '20 at 18:06
Please provide a [mcve], as well as the entire error message(s). — AMC, Feb 24 '20 at 18:07
Thank you AMC, is that better? I thought it may have to do with a special character after reading this https://stackoverflow.com/questions/41659309/got-bad-character-range-in-regex-when-using-comma-after-dash-but-not-reverse — pandas, Feb 24 '20 at 18:18

score 0 · Answer 1 · answered Feb 24 '20 at 17:43

0

How about:

mask = df['End Customer Name'].isin(Customer_List)
df['covered'] = 0
df.loc[mask, 'covered'] = 1

answered Feb 24 '20 at 17:43

TaxpayersMoney

669
1
8
26

Thanks TaxpayersMoney, but there are many rows in which the Customer_List is a substring in the 'End Customer Name' string, which is why I was using contains. Example: End Customer Name -Apple Inc, Apple Incorporation, Apple Inc. Customer List ["Apple Inc"] – pandas Feb 24 '20 at 17:58

score 0 · Answer 2 · answered Feb 25 '20 at 01:20

0

Thanks everyone, it has to do with my Customer_List having special characters so I needed to use map(re.escape

This link helped me below Python regex bad character range.

answered Feb 25 '20 at 01:20

pandas

21
4

How to find a string match in df col based on list of strings?

2 Answers2