Extract substring in dataframe based on start and stop indexes positions defined in two other columns

Question

df=  "start", "stop", "Seq"
   50       121   aaaaaaaaaaaaabbbbbbbbbbbbcccccccccc...dddddd
   25       150   aaaaahhhhhhhssssssssssssssccccccccc...dddddd

I need to extract a substring in column "Seq" of the dataframe(df) using str.slice(start=start, stop=stop) using as start and stop values the values in a columns named "start" and "stop" (for each of the rows of the dataframe).

I would like to use a def function or lambda but I get errors

def f(x,y,z):
return z.str.slice(start=x, stop=y)
df.apply(lambda x: f(x["start"],x["stop"],x["Seq"]))

Output: KeyError: ('start', 'occurred at index id')

Possible duplicate of [Pandas substring using another column as the index](https://stackoverflow.com/questions/56605509/pandas-substring-using-another-column-as-the-index) — Georgy, Oct 30 '19 at 14:19

Erfan · Accepted Answer · 2019-10-30T14:33:47.730

Use .apply to apply the slicing on each row in the form of: string[start:stop]

df.apply(lambda x: x['Seq'][x['start']:x['stop']], axis=1)

0      aaabbbbbbbb
1    sssssssssssss
dtype: object

If you want to define a function:

def slice_str(string, start, stop):
    return string[start:stop]

df.apply(lambda x: slice_str(x['Seq'], x['start'], x['stop']), axis=1)

Or using zip with list comprehension:

slices = [string[start:stop] for string, start, stop
          in zip(df['Seq'], df['start'], df['stop'])]

['aaabbbbbbbb', 'sssssssssssss']

Input dataframe used:

   start  stop                                        Seq
0     10    21  aaaaaaaaaaaaabbbbbbbbbbbbccccccccccdddddd
1     12    25  aaaaahhhhhhhsssssssssssssscccccccccdddddd

Extract substring in dataframe based on start and stop indexes positions defined in two other columns

1 Answers1