Splitting strings in a series when a number is encountered in python

Question

I have a series of strings in a dataframe, and I want to get rid of everything in the string once a number starts. Here's an example:

sstrings['abc12390859', 'def1959836', 'dab3496876', 'gh34643267']

so, in the end, I want it to be:

sstrings['abc', 'def', 'dab', 'gh']

I thought about doing something like:

df['sstrings'] = df['sstrings'].str.split()

but since the leading number isn't always the same, I'm not sure how to make that work.

I saw this but that doesn't seem to work with a series.

Is there a way to do this without looping through the series and using re.split?

timgeb · Accepted Answer · 2014-07-09T18:08:20.407

3

You could use a regular expression. Demo:

>>> import re
>>> s = ['abc12390859', 'def1959836', 'dab3496876', 'gh34643267']    
>>> ss = [re.match(r'[^\d]+', x).group(0) for x in s]
>>> ss
['abc', 'def', 'dab', 'gh']

Explanation:

\d matches any digit.
[^\d] matches anything that is not a digit
[^\d]+ matches a sequence of one or more non-digits.

The documentation for re.match can be found here. It will return a MatchObject (from which we extract the matching string with group) if zero or more characters at the beginning of the string match our pattern [^\d]+. re.match is applied to all x in your original list s with a list comprehension.

edited Jul 09 '14 at 18:08

answered Jul 09 '14 at 17:52

timgeb

76,762
20
123
145

That works! Thanks. Can you explain the code for it? – PointXIV Jul 09 '14 at 17:57
@PointXIV certainly, is it enough to explain the regular expression, or do you need an explanation for list comprehensions too? – timgeb Jul 09 '14 at 18:00
1

Tip: you can use `s.str.extract()` to do this directly on a pandas Series: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.core.strings.StringMethods.extract.html – joris Jul 09 '14 at 18:00
just the regular expression is fine. Thanks! – PointXIV Jul 09 '14 at 18:02

score 0 · Answer 2 · answered Jul 09 '14 at 19:39

In case, the final part of each string consists only from numbers, you can use:

>>> lst = ['abc12390859', 'def1959836', 'dab3496876', 'gh34643267']
>>> map(lambda txt: txt.rstrip("0123456789"), lst)
['abc', 'def', 'dab', 'gh']

or using list comprehension:

>>> [txt.rstrip("0123456789") for txt in  lst]
['abc', 'def', 'dab', 'gh']

Splitting strings in a series when a number is encountered in python

2 Answers2