-1

I have a list of 1000 corporate companies and a df of all previous transactions for the year. For every match, I would like to create a new row value (True) in the new column (df$Covered).

I am not sure why I keep getting the errors below. I tried researching these questions but no luck so far.

Match string to list of defined strings

Pandas extract rows from df where df['col'] values match df2['col'] values

Code Example: when I set regex=False

Customer_List = ['3M','Cargill,'Chili's,---]

df['Covered'] = df[df['End Customer Name'].str.contains('|'.join(Customer_List),case=False, na=False, regex=False)]

ValueError: Wrong number of items passed 32, placement implies 1

Code Example: when I set regex=True

error: bad character range H-D at position 177825

 ~/opt/anaconda3/lib/python3.7/sre_parse.py in parse(str, flags, pattern)
    928 
    929     try:
--> 930         p = _parse_sub(source, pattern, flags & SRE_FLAG_VERBOSE, 0)
    931     except Verbose:
    932         **# the VERBOSE flag was switched on inside the pattern.  to be**

~/opt/anaconda3/lib/python3.7/sre_parse.py in _parse_sub(source, state, verbose, nested)
    424     while True:
    425         itemsappend(_parse(source, state, verbose, nested + 1,
--> 426                            **not nested and not items**))
    427         if not sourcematch("|"):
    428             break
pandas
  • 21
  • 4
  • are you able to add some sample data? – Umar.H Feb 24 '20 at 17:18
  • possible to post the O/P of df.sample().to_dict() - that will help to recreate/test the problem. – instinct246 Feb 24 '20 at 17:23
  • df['End Customer Name'] are 100k+ rows of names while Customer_List is a list of 1000 company names, does that help? – pandas Feb 24 '20 at 17:24
  • 2
    Why are saying 'regex=False'? You are creating a regular expression by joining your terms with the 'bar' symbol meaning OR in regex. – Scott Boston Feb 24 '20 at 17:24
  • Thanks Scott, I didn't know if I needed a literal string or Regex. Do you think it has to do with having a special character? – pandas Feb 24 '20 at 17:32
  • @pandas _Do you think it has to do with having a special character?_ What do you mean? – AMC Feb 24 '20 at 18:06
  • Please provide a [mcve], as well as the entire error message(s). – AMC Feb 24 '20 at 18:07
  • Thank you AMC, is that better? I thought it may have to do with a special character after reading this https://stackoverflow.com/questions/41659309/got-bad-character-range-in-regex-when-using-comma-after-dash-but-not-reverse – pandas Feb 24 '20 at 18:18

2 Answers2

0

How about:

mask = df['End Customer Name'].isin(Customer_List)
df['covered'] = 0
df.loc[mask, 'covered'] = 1
TaxpayersMoney
  • 669
  • 1
  • 8
  • 26
  • Thanks TaxpayersMoney, but there are many rows in which the Customer_List is a substring in the 'End Customer Name' string, which is why I was using contains. Example: End Customer Name -Apple Inc, Apple Incorporation, Apple Inc. Customer List ["Apple Inc"] – pandas Feb 24 '20 at 17:58
0

Thanks everyone, it has to do with my Customer_List having special characters so I needed to use map(re.escape

This link helped me below Python regex bad character range.

pandas
  • 21
  • 4