pandas select from Dataframe using startswith

Question

This works (using Pandas 12 dev)

table2=table[table['SUBDIVISION'] =='INVERNESS']

Then I realized I needed to select the field using "starts with" Since I was missing a bunch. So per the Pandas doc as near as I could follow I tried

criteria = table['SUBDIVISION'].map(lambda x: x.startswith('INVERNESS'))
table2 = table[criteria]

And got AttributeError: 'float' object has no attribute 'startswith'

So I tried an alternate syntax with the same result

table[[x.startswith('INVERNESS') for x in table['SUBDIVISION']]]

Reference http://pandas.pydata.org/pandas-docs/stable/indexing.html#boolean-indexing Section 4: List comprehensions and map method of Series can also be used to produce more complex criteria:

What am I missing?

Could you give a small example which demonstrates this, I'm surprised that the list comprehension wouldn't raise in the same way as the map... — Andy Hayden, Jul 30 '13 at 22:21

score 127 · Accepted Answer · edited Apr 12 '20 at 21:31

127

You can use the str.startswith DataFrame method to give more consistent results:

In [11]: s = pd.Series(['a', 'ab', 'c', 11, np.nan])

In [12]: s
Out[12]:
0      a
1     ab
2      c
3     11
4    NaN
dtype: object

In [13]: s.str.startswith('a', na=False)
Out[13]:
0     True
1     True
2    False
3    False
4    False
dtype: bool

and the boolean indexing will work just fine (I prefer to use loc, but it works just the same without):

In [14]: s.loc[s.str.startswith('a', na=False)]
Out[14]:
0     a
1    ab
dtype: object

.

It looks least one of your elements in the Series/column is a float, which doesn't have a startswith method hence the AttributeError, the list comprehension should raise the same error...

edited Apr 12 '20 at 21:31

Neil

7,482
6
50
56

answered Jul 30 '13 at 22:16

Andy Hayden

359,921
101
625
535

Thank you for your reply...Don't seem to be getting there Tried table['SUBDIVISION']str.startswith('INVERNESS', na=False) and got table['SUBDIVISION']str.startswith('INVERNESS', na=False) ^ SyntaxError: invalid syntax Wondering if I did not import something essential? I don't get it since my straight == condition works fine – dartdog Jul 31 '13 at 02:01
1

And if I try table.loc[table['SUBDIVISION'].str.startswith('INVERNESS', na=False)] I get a good result!! But I still don't get what is wrong with the prior attempts? – dartdog Jul 31 '13 at 02:16
1

@dartdog you were missing a dot. please include a small subset of the data which demonstrates the problem (it seems hard to believe :s) – Andy Hayden Jul 31 '13 at 08:12
Sorry the data is 27 cols long, kind of unwieldy even to post a clip. I tried this> table['SUBDIVISION'].str.startswith('INVERNESS',na='False') with another '.' and the comparison came out bad (all selected false) I still don't get why my original syntax failed since I thought I was following the documentation. – dartdog Jul 31 '13 at 11:39
@dartdog presumably it's only the SUBDIVISION column which is relevant, so just paste that. – Andy Hayden Jul 31 '13 at 11:43
0 MT HIGH 1 LAKE CYRUS 2 TRUSSVILLE 3 CLANTON 4 BEAVER CREEK Name: SUBDIVISION, dtype: object – dartdog Jul 31 '13 at 11:53
Do you mind just pastebinning SUBDIVISION, I can't read_clipboard that. – Andy Hayden Jul 31 '13 at 12:25
http://pastebin.com/imjNEwg3 see also pasted data above MT HIGH.... the pastebin is the properly selected data. the data above is a sample of the "unselected" – dartdog Jul 31 '13 at 12:51
@dartdog can't replicate, perhaps output of .to_dict() would help – Andy Hayden Jul 31 '13 at 12:55
Asking for more help here https://groups.google.com/d/topic/pydata/_QX9YiPZldw/discussion – dartdog Jul 31 '13 at 14:05

score 32 · Answer 2 · answered Mar 25 '18 at 16:31

32

To retrieve all the rows which startwith required string

dataFrameOut = dataFrame[dataFrame['column name'].str.match('string')]

To retrieve all the rows which contains required string

dataFrameOut = dataFrame[dataFrame['column name'].str.contains('string')]

answered Mar 25 '18 at 16:31

Vinoj John Hosan

6,448
2
41
37

11

Why use the `str.match()` function to determine whether a value starts with a specific string when you could use `str.startswith()`? `str.match()` is for matching a value against a regular expression. If you don't need a regular expression, using that function **_may_** make your code slower than necessary. – Mr. Lance E Sloan Jul 01 '18 at 01:16

score 15 · Answer 3 · answered Jan 21 '20 at 05:25

15

Using startswith for a particular column value

df  = df.loc[df["SUBDIVISION"].str.startswith('INVERNESS', na=False)]

answered Jan 21 '20 at 05:25

Saurabh

7,525
4
45
46

AleAve81 · Answer 4 · 2019-12-22T08:18:57.803

6

You can use apply to easily apply any string matching function to your column elementwise.

table2=table[table['SUBDIVISION'].apply(lambda x: x.startswith('INVERNESS'))]

this assuming that your "SUBDIVISION" column is of the correct type (string)

Edit: fixed missing parenthesis

edited Dec 22 '19 at 08:18

answered May 10 '19 at 08:48

AleAve81

275
3
8

This worked for me once I added another closing bracket `table2=table[table['SUBDIVISION'].apply(lambda x: x.startswith('INVERNESS')]]` – avirr Oct 17 '19 at 02:43

score 2 · Answer 5 · answered May 09 '22 at 19:54

2

This can also be achieved using query:

table.query('SUBDIVISION.str.startswith("INVERNESS").values')

answered May 09 '22 at 19:54

rachwa

1,805
1
14
17

pandas select from Dataframe using startswith

5 Answers5

Linked

Related