This function looks at strings in a pandas DataFrame
. If the string contains a regular expression matching an entry in the dictionary, it passes on the captured string to other parts of the function and finally returns statement
.
def f(value):
f1 = lambda x: dictionary[regex.findall(x)[0]] if regex.findall(x)[0] in dictionary else ""
match = f1(value)
#Do stuff
return statement
Question:
How can I make it accept partial matches, and replace the matching word, while keeping the rest of the string intact? Right now it only accepts literal matches.
Goal:
The string is "BULL GOOGLE X3 VON"
. I would like {"GOOG":
in the dictionary to be sufficient to transform the word to :"Google"}
. The transformed string would be "BULL Google X3 VON"
, and the function passes on "Google"
.
Note: I want to continue using dict
for the implementation because other parts of the program depends on it.
Code:
#DataFrame
df = pd.DataFrame(["BULL GOOGLE X3 VON", "BEAR TWITTER 12X S"], columns=["Name"])
#Dict
google = {"GOOG":"Google"}
twitter = {"TWITT":"Twitter"}
dictionary = goog.copy()
dictionary.update(twitter)
#Regex
regex = re.compile(r"\s(\S+)\s", flags=re.IGNORECASE)
#Function
def f(value):
f1 = lambda x: dictionary[regex.findall(x)[0]] if regex.findall(x)[0] in dictionary else ""
match = f1(value)
#Do stuff
return statement
#Map Function
df["Statement"] = df["Name"].map(lambda x:f(x))
Ideas:
If it's possible to modify the function directly to accept partial matches, that would be good.
Otherwise, a solution might be to first replace
the matching word in the string – keeping the rest of the string intact – and then match the regex substring with the dictionary. These steps could happen in a temporary column so that the column "Name"
is still in its original state for future use.