Exploring alternatives to apply conditions to every row of dataframe other than pandas apply

Question

I have a function called postprocess that applies while loop condition to find for - and alphabets to each dataframe row. postprocess looks like this:

def postprocess(description, start_index, end_index):
      if (start_index > 0) & (start_index < len(description)):
        while bool(re.match(r"\w|\'|-", description[start_index - 1])) & bool(
            re.match(r"\w|\'|-", description[start_index])
        ):
            start_index = start_index - 1
            if new_start == 0:
                break
      description = description[new_start:new_end]
      return description

For example the description is credit payment velvet-burger and the start_index is 7 and end_index is 12. So description[start_index] will be b Which is the b in burger will be run in a while loop by tracing backwards to return the target substring we want to see because burger is not complete as we want the word velvet- also. After running postprocess we will get velvet-burger. The complete code looks like this:

df["target_substring"] = df.apply(lambda x: postprocess(
                         x["description"], x["start_index"], x["end_index"]+1),
                         axis=1)

Is there a better way to write this code?

If your code works well and there is no issue, please consider posting the question at [codereview.se]. — Wiktor Stribiżew, Mar 05 '20 at 08:50

score 0 · Answer 1 · answered Mar 05 '20 at 08:33

0

You might also want to try iterrows() (documentation)

for rowindex, rowvalues in df.iterrows():
   # do stuff with rowvalues['description']...

answered Mar 05 '20 at 08:33

rgralma

145
7

score 0 · Answer 2 · answered Mar 06 '20 at 11:34

0

Also take a look at np.vectorize from numpy module. Can really increase your code efficiency. Checkit here

answered Mar 06 '20 at 11:34

rgralma

145
7

Exploring alternatives to apply conditions to every row of dataframe other than pandas apply

2 Answers2