1

This function looks at strings in a pandas DataFrame. If the string contains a regular expression matching an entry in the dictionary, it passes on the captured string to other parts of the function and finally returns statement.

def f(value):
    f1 = lambda x: dictionary[regex.findall(x)[0]] if regex.findall(x)[0] in dictionary else ""
    match = f1(value)
    #Do stuff
    return statement

Question:

How can I make it accept partial matches, and replace the matching word, while keeping the rest of the string intact? Right now it only accepts literal matches.

Goal:

The string is "BULL GOOGLE X3 VON". I would like {"GOOG": in the dictionary to be sufficient to transform the word to :"Google"}. The transformed string would be "BULL Google X3 VON", and the function passes on "Google".

Note: I want to continue using dict for the implementation because other parts of the program depends on it.

Code:

#DataFrame
df = pd.DataFrame(["BULL GOOGLE X3 VON", "BEAR TWITTER 12X S"], columns=["Name"])

#Dict
google = {"GOOG":"Google"}
twitter = {"TWITT":"Twitter"}
dictionary = goog.copy()
dictionary.update(twitter)

#Regex
regex = re.compile(r"\s(\S+)\s", flags=re.IGNORECASE)

#Function
def f(value):
    f1 = lambda x: dictionary[regex.findall(x)[0]] if regex.findall(x)[0] in dictionary else ""
    match = f1(value)
    #Do stuff
    return statement

#Map Function
df["Statement"] = df["Name"].map(lambda x:f(x))

Ideas:

If it's possible to modify the function directly to accept partial matches, that would be good.

Otherwise, a solution might be to first replace the matching word in the string – keeping the rest of the string intact – and then match the regex substring with the dictionary. These steps could happen in a temporary column so that the column "Name" is still in its original state for future use.

P A N
  • 5,642
  • 15
  • 52
  • 103

1 Answers1

2

I think this might be what you are looking for.

df = pd.DataFrame(["BULL GOOGLE X3 VON", "BEAR TWITTER 12X S"], columns ["Name"])

#Dict
google = {"GOOG":"Google"}
twitter = {"TWITT":"Twitter"}
dictionary = google.copy()
dictionary.update(twitter)

#Regex
regex = re.compile(r"\b((%s)\S*)\b" %"|".join(dictionary.keys()), re.I)

def dictionary_lookup(match):
    return dictionary[match.group(2)]

#Function
def f(value):
    match = dictionary[regex.search(value).group(2)]
    #Do stuff
    statement = regex.sub(dictionary_lookup, value)
    return statement

#Map Function
df["Statement"] = df["Name"].map(lambda x:f(x))

This will match any word that starts with one of the keys in the dictionary, assign the value of the match from the dictionary to the variable match and then return the original string with the matched word replaced.

Joseph Stover
  • 397
  • 4
  • 13