0

I have a function called postprocess that applies while loop condition to find for - and alphabets to each dataframe row. postprocess looks like this:

def postprocess(description, start_index, end_index):
      if (start_index > 0) & (start_index < len(description)):
        while bool(re.match(r"\w|\'|-", description[start_index - 1])) & bool(
            re.match(r"\w|\'|-", description[start_index])
        ):
            start_index = start_index - 1
            if new_start == 0:
                break
      description = description[new_start:new_end]
      return description

For example the description is credit payment velvet-burger and the start_index is 7 and end_index is 12. So description[start_index] will be b Which is the b in burger will be run in a while loop by tracing backwards to return the target substring we want to see because burger is not complete as we want the word velvet- also. After running postprocess we will get velvet-burger. The complete code looks like this:

df["target_substring"] = df.apply(lambda x: postprocess(
                         x["description"], x["start_index"], x["end_index"]+1),
                         axis=1)

Is there a better way to write this code?

Chia Yi
  • 562
  • 2
  • 7
  • 21

2 Answers2

0

You might also want to try iterrows() (documentation)

for rowindex, rowvalues in df.iterrows():
   # do stuff with rowvalues['description']...
rgralma
  • 145
  • 7
0

Also take a look at np.vectorize from numpy module. Can really increase your code efficiency. Checkit here

rgralma
  • 145
  • 7