I have a function called postprocess
that applies while loop condition to find for -
and alphabets
to each dataframe row. postprocess
looks like this:
def postprocess(description, start_index, end_index):
if (start_index > 0) & (start_index < len(description)):
while bool(re.match(r"\w|\'|-", description[start_index - 1])) & bool(
re.match(r"\w|\'|-", description[start_index])
):
start_index = start_index - 1
if new_start == 0:
break
description = description[new_start:new_end]
return description
For example the description
is credit payment velvet-burger
and the start_index
is 7 and end_index
is 12. So description[start_index]
will be b
Which is the b
in burger
will be run in a while loop by tracing backwards to return the target substring we want to see because burger
is not complete as we want the word velvet-
also.
After running postprocess
we will get velvet-burger
.
The complete code looks like this:
df["target_substring"] = df.apply(lambda x: postprocess(
x["description"], x["start_index"], x["end_index"]+1),
axis=1)
Is there a better way to write this code?