2

So basically I want to create a function that takes in a bunch of strings, checks if a particular column has that string then returns a boolean expression. I can easily do this with a single string. But I'm stumped on how to do it as a list of strings.

# Single String Example
def mask(x, df):
    return df.description.str.contains(x)
df[mask('sql')]

# Some kind of example of what I want
def mask(x, df):
    return df.description.str.contains(x[0]) & df.description.str.contains(x[1]) & df.description.str.contains(x[2]) & ...
df[mask(['sql'])]

Any help would be appreciated :)

So it looks like I figured out a way to do it, little unorthodox but seems to be working anyway. Solution below

def mask(x):
    X = np.prod([df.description.str.contains(i) for i in x], axis = 0)
    return [True if i == 1 else False for i in X]
my_selection = df[mask(['sql', 'python'], df)]
H K
  • 73
  • 6
  • Possible duplicate of [pandas dataframe str.contains() AND operation](https://stackoverflow.com/questions/37011734/pandas-dataframe-str-contains-and-operation) – Chris Jul 29 '19 at 02:06

2 Answers2

1

Try using:

def mask(x, df):
    return df.description.str.contains(''.join(map('(?=.*%s)'.__mod__, x)))
df[mask(['a', 'b'], df)]

The (?=.*<word>) one after another is really an and operator.

U13-Forward
  • 69,221
  • 14
  • 89
  • 114
0

Managed to work out a solution here:

def mask(x):
    X = np.prod([df.description.str.contains(i) for i in x], axis = 0)
    return [True if i == 1 else False for i in X]
mine = df[mask(['sql', 'python'], df)]

A little unorthodox so if anyone has anything better will be appreciated

H K
  • 73
  • 6