6

I have a pandas dataframe and want to select rows where values of a column starts with values of another column. I have tried the following:

import pandas as pd

df = pd.DataFrame({'A': ['apple', 'xyz', 'aa'],
                   'B': ['app', 'b', 'aa']})

df_subset = df[df['A'].str.startswith(df['B'])]

But it errors out and this solutions that I found also have not been helping.

KeyError: "None of [Float64Index([nan, nan, nan], dtype='float64')] are in the [columns]"

np.where(df['A'].str.startswith(df['B']), True, False) from here also returns True for all.

Reveille
  • 4,359
  • 3
  • 23
  • 46

3 Answers3

7

For row wise comparison, we can use DataFrame.apply:

m = df.apply(lambda x: x['A'].startswith(x['B']), axis=1)
df[m]

       A    B
0  apple  app
2     aa   aa

The reason your code is not working is because Series.str.startswith accepts a character sequence (a string scalar), and you are using a pandas Series. Quoting the docs:

pat : str
Character sequence. Regular expressions are not accepted.

Erfan
  • 40,971
  • 8
  • 66
  • 78
  • Brilliant! I did try `apply` with `lambda` too but failed to get it work; was missing the `axis=1`. – Reveille Jun 23 '20 at 14:03
  • 1
    Yes that can be confusing at start, basically the idea is that you want to apply your function on each row (so over the column axis) and not per column (which is the index axis). In this `axis='columns'` would also suffice. – Erfan Jun 23 '20 at 14:19
3

You may need to do with for loop , since the row check is not support with str.startswith

[x.startswith(y) for x , y in zip(df.A,df.B)]
Out[380]: [True, False, True]
df_sub=df[[x.startswith(y) for x , y in zip(df.A,df.B)]].copy()
BENY
  • 317,841
  • 20
  • 164
  • 234
1

You can achieve this without using for loop:

import pandas as pd
import numpy as np

df = pd.DataFrame({'A': ['apple', 'xyz', 'aa'],
                   'B': ['app', 'b', 'aa']})

ufunc = np.frompyfunc(str.startswith, 2, 1)
idx = ufunc(df['A'], df['B'])
df[idx]

Out[22]: 
       A    B
0  apple  app
2     aa   aa
Woods Chen
  • 574
  • 3
  • 13