I have a problem. I have a text
which is a freetext. And a regex should regnoize element
what is a pattern. Unfortunately for some element
s there are abbrevation. So thats why I generated a abbrevation dict. Is there an option to also loop through the dict
. If the element is inside the dict? That the abbrevation ca
also does match.
Dataframe
customerId text element code
0 1 Something with Cat cat 0
1 3 That is a huge dog dog 1
2 3 Hello agian mouse 2
3 3 This is a ca cat 0
Code
import pandas as pd
import copy
import re
d = {
"customerId": [1, 3, 3, 3],
"text": ["Something with Cat", "That is a huge dog", "Hello agian", 'This is a ca'],
"element": ['cat', 'dog', 'mouse', 'cat'],
"code": [9,8,7, 9]
}
df = pd.DataFrame(data=d)
df['code'] = df['element'].astype('category').cat.codes
print(df)
abbreviation = {
"cat": {
"abbrev1": "ca",
},
}
%%time
elements = df['element'].unique()
def f(x):
match = 999
for element in elements:
elements2 = [element]
y = bool(re.search(element, x['text'], re.IGNORECASE))
#^ here
if(y):
#print(forwarder)
match = x['code']
#match = True
break
x['test'] = match
return x
df['test'] = None
df = df.apply(lambda x: f(x), axis = 1)
What I have
customerId text element code test
0 1 Something with Cat cat 0 0
1 3 That is a huge dog dog 1 1
2 3 Hello agian mouse 2 999
3 3 This is a ca cat 0 999
What I want
customerId text element code test
0 1 Something with Cat cat 0 0
1 3 That is a huge dog dog 1 1
2 3 Hello agian mouse 2 999
3 3 This is a ca cat 0 0