There's a bunch of similar questions that have the same solution: how do I check my list of strings against a larger string and see if there's a match? How to check if a string contains an element from a list in Python How to check if a line has one of the strings in a list?
I have a different problem: how do I check my list of strings against a larger string, see if there's a match, and isolate the string so I can perform another string operation relative to the matched string?
Here's some sample data:
| id | data |
|--------|---------------------|
| 123131 | Bear Cat Apple Dog |
| 123131 | Cat Ap.ple Mouse |
| 231321 | Ap ple Bear |
| 231321 | Mouse Ap ple Dog |
Ultimately, I'm trying to find all instances of "apple" ['Apple', 'Ap.ple', 'Ap ple']
and, while it doesn't really matter which one is matched, I need to be able to find out if Cat
or Bear
exist before it or after it. Position of the matched string does not matter, only an ability to determine what is before or after it.
In Bear Cat Apple Dog
Bear is before Apple, even though Cat is in the way.
Here's where I am at with my sample code:
data = [[123131, "Bear Cat Apple Dog"], ['123131', "Cat Ap.ple Mouse"], ['231321', "Ap ple Bear"], ['231321', "Mouse Ap ple Dog"]]
df = pd.DataFrame(data, columns = ['id', 'data'])
def matching_function(m):
matching_strings = ['Apple', 'Ap.ple', 'Ap ple']
if any(x in m for x in matching_strings):
# do something to print the matched string
return True
df["matched"] = df['data'].apply(matching_function)
Would it be simply better to just do this in regex?
Right now, the function simply returns true. But if there's a match I imagine it could also return matched_bear_before
matched_bear_after
or the same for Cat and fill that into the df['matched'] column.
Here's some sample output:
| id | data | matched |
|--------|---------------------|---------|
| 123131 | Bear Cat Apple Dog | TRUE |
| 123131 | Cat Ap.ple Mouse | TRUE |
| 231321 | Ap ple Bear | TRUE |
| 231321 | Mouse Ap ple Dog | FALSE |