I have the following function to detect strings in my data, I joined both the key and values of the dictionary since I want to find both values. I added ^ and $ because I only want exact matches.
Function
import pandas as pd
def check_direction(df):
# dict for all direction and their abbreviation
direction = {
'^Northwest$': '^NW$',
'^Northeast$': '^NE$',
'^Southeast$': '^SE$',
'^Southwest$': '^SW$',
'^North$': '^N$',
'^East$': '^E$',
"^South$": '^S$',
"^West$": "^W$"}
# combining all the dict pairs into one for str match
all_direction = direction.keys() | direction.values()
all_direction = '|'.join(all_direction)
df = df.astype(str)
df = pd.DataFrame(df.str.contains(all_direction, case = False))
return df
I ran tests on the following series which worked as intended:
tmp = pd.Series(['Monday', 'Tuesday', 'Wednesday', 'Thursday'])
check_direction(tmp)
0 False
1 False
2 False
3 False
tmp = pd.Series(['SOUTH', 'NORTHEAST', 'WEST'])
check_direction(tmp)
0 True
1 True
2 True
However I ran into problems here:
tmp = pd.Series(['32 Street NE', 'Ogden Road SE'])
check_direction(tmp)
0 False
1 False
Both returned as false when it should be True because of NE and SE, how can I modify my code to make that happen?