python select subset where df column value contains one of the values in an array

Question

I have created a simple dataframe in python with these columns

Columns: [index, bulletintype, category, companyname, date, url]

I have a simple array with company

companies= [x,y,x]

I would like to create a subset of the dataframe if the column 'companyname' matches on one or more of the names in the companies array.

subset = df[df['companyname'].isin(companies)]

This works pretty great but .isin makes an exact match and my sources don't use the same names. So I'm looking for an alternative angle and would like to use parts of the name to compare. I'm familiar with .str.contains('part of the name') but I can't use this functions in conjunction with an array. Can somebody help me to achieve something like this (but with working code :-)

subset = df[df['companyname'].contains(companies)]

score 2 · Answer 1 · answered Oct 07 '18 at 11:45

2

Try creating a regex pattern by joining your companies list with the regex OR character | then use series.str.contains as a boolean mask:

companies = ['x', 'y', 'z']
pat = '|'.join(companies)
df[df['companies'].str.contains(pat)]

answered Oct 07 '18 at 11:45

Chris Adams

18,389
4
22
39

1

Thanks Chris! great and efficient solution. – bsparks Oct 07 '18 at 18:40

python select subset where df column value contains one of the values in an array

1 Answers1