98

This works (using Pandas 12 dev)

table2=table[table['SUBDIVISION'] =='INVERNESS']

Then I realized I needed to select the field using "starts with" Since I was missing a bunch. So per the Pandas doc as near as I could follow I tried

criteria = table['SUBDIVISION'].map(lambda x: x.startswith('INVERNESS'))
table2 = table[criteria]

And got AttributeError: 'float' object has no attribute 'startswith'

So I tried an alternate syntax with the same result

table[[x.startswith('INVERNESS') for x in table['SUBDIVISION']]]

Reference http://pandas.pydata.org/pandas-docs/stable/indexing.html#boolean-indexing Section 4: List comprehensions and map method of Series can also be used to produce more complex criteria:

What am I missing?

dartdog
  • 10,432
  • 21
  • 72
  • 121
  • Could you give a small example which demonstrates this, I'm surprised that the list comprehension wouldn't raise in the same way as the map... – Andy Hayden Jul 30 '13 at 22:21

5 Answers5

127

You can use the str.startswith DataFrame method to give more consistent results:

In [11]: s = pd.Series(['a', 'ab', 'c', 11, np.nan])

In [12]: s
Out[12]:
0      a
1     ab
2      c
3     11
4    NaN
dtype: object

In [13]: s.str.startswith('a', na=False)
Out[13]:
0     True
1     True
2    False
3    False
4    False
dtype: bool

and the boolean indexing will work just fine (I prefer to use loc, but it works just the same without):

In [14]: s.loc[s.str.startswith('a', na=False)]
Out[14]:
0     a
1    ab
dtype: object

.

It looks least one of your elements in the Series/column is a float, which doesn't have a startswith method hence the AttributeError, the list comprehension should raise the same error...

Neil
  • 7,482
  • 6
  • 50
  • 56
Andy Hayden
  • 359,921
  • 101
  • 625
  • 535
  • Thank you for your reply...Don't seem to be getting there Tried table['SUBDIVISION']str.startswith('INVERNESS', na=False) and got table['SUBDIVISION']str.startswith('INVERNESS', na=False) ^ SyntaxError: invalid syntax Wondering if I did not import something essential? I don't get it since my straight == condition works fine – dartdog Jul 31 '13 at 02:01
  • 1
    And if I try table.loc[table['SUBDIVISION'].str.startswith('INVERNESS', na=False)] I get a good result!! But I still don't get what is wrong with the prior attempts? – dartdog Jul 31 '13 at 02:16
  • 1
    @dartdog you were missing a dot. please include a small subset of the data which demonstrates the problem (it seems hard to believe :s) – Andy Hayden Jul 31 '13 at 08:12
  • Sorry the data is 27 cols long, kind of unwieldy even to post a clip. I tried this> table['SUBDIVISION'].str.startswith('INVERNESS',na='False') with another '.' and the comparison came out bad (all selected false) I still don't get why my original syntax failed since I thought I was following the documentation. – dartdog Jul 31 '13 at 11:39
  • @dartdog presumably it's only the SUBDIVISION column which is relevant, so just paste that. – Andy Hayden Jul 31 '13 at 11:43
  • 0 MT HIGH 1 LAKE CYRUS 2 TRUSSVILLE 3 CLANTON 4 BEAVER CREEK Name: SUBDIVISION, dtype: object – dartdog Jul 31 '13 at 11:53
  • Do you mind just pastebinning SUBDIVISION, I can't read_clipboard that. – Andy Hayden Jul 31 '13 at 12:25
  • http://pastebin.com/imjNEwg3 see also pasted data above MT HIGH.... the pastebin is the properly selected data. the data above is a sample of the "unselected" – dartdog Jul 31 '13 at 12:51
  • @dartdog can't replicate, perhaps output of .to_dict() would help – Andy Hayden Jul 31 '13 at 12:55
  • Asking for more help here https://groups.google.com/d/topic/pydata/_QX9YiPZldw/discussion – dartdog Jul 31 '13 at 14:05
32

To retrieve all the rows which startwith required string

dataFrameOut = dataFrame[dataFrame['column name'].str.match('string')]

To retrieve all the rows which contains required string

dataFrameOut = dataFrame[dataFrame['column name'].str.contains('string')]
Vinoj John Hosan
  • 6,448
  • 2
  • 41
  • 37
  • 11
    Why use the `str.match()` function to determine whether a value starts with a specific string when you could use `str.startswith()`? `str.match()` is for matching a value against a regular expression. If you don't need a regular expression, using that function **_may_** make your code slower than necessary. – Mr. Lance E Sloan Jul 01 '18 at 01:16
15

Using startswith for a particular column value

df  = df.loc[df["SUBDIVISION"].str.startswith('INVERNESS', na=False)]
Saurabh
  • 7,525
  • 4
  • 45
  • 46
6

You can use apply to easily apply any string matching function to your column elementwise.

table2=table[table['SUBDIVISION'].apply(lambda x: x.startswith('INVERNESS'))]

this assuming that your "SUBDIVISION" column is of the correct type (string)

Edit: fixed missing parenthesis

AleAve81
  • 275
  • 3
  • 8
  • This worked for me once I added another closing bracket `table2=table[table['SUBDIVISION'].apply(lambda x: x.startswith('INVERNESS')]]` – avirr Oct 17 '19 at 02:43
2

This can also be achieved using query:

table.query('SUBDIVISION.str.startswith("INVERNESS").values')
rachwa
  • 1,805
  • 1
  • 14
  • 17