How to split strings into text and number?

Question

I'd like to split strings like these

'foofo21'
'bar432'
'foobar12345'

into

['foofo', '21']
['bar', '432']
['foobar', '12345']

Does somebody know an easy and simple way to do this in python?

score 81 · Accepted Answer · edited Aug 01 '20 at 19:31

81

I would approach this by using re.match in the following way:

import re
match = re.match(r"([a-z]+)([0-9]+)", 'foofo21', re.I)
if match:
    items = match.groups()
print(items)
>> ("foofo", "21")

edited Aug 01 '20 at 19:31

Ehsan Tabatabaei

143
7

answered Jan 09 '09 at 23:12

Evan Fosmark

98,895
36
105
117

you probably want \w instead of [a-z] and \d instead of [0-9] – Dan Jan 09 '09 at 23:22
5

@Dan: Using \w is a poor choice as it matches all alphanumeric characters, not just a-z. So, the entire string would be caught in the first group. – Evan Fosmark Jan 09 '09 at 23:30
Not if you match it ungreedy as I do in my answer. – PEZ Jan 09 '09 at 23:46
@Bernard, notice the `re.I` at the end. That makes case a non-issue. – Evan Fosmark Jan 10 '09 at 00:56
You might get some false positives using this method. If you tried m = r.match("abc123def"), then m.groups() would get you ('abc', '123'). That's because re.match() matches from the beginning of a string but doesn't need to match the entire string. – eksortso Jan 10 '09 at 01:24
2

If that's a concern, you can tack '\b' (IIRC) at the end, to specify that the match must end at a word boundary (or '$' to match the end of the string). – Jeff Shannon Jan 10 '09 at 08:17
1

How can this be extended to str-digit-str-digit such as p6max20 to get p=6, max=20? "( )( )( )( )" four grouping? – Joonho Park Oct 05 '20 at 08:32
3

`re.split('(\d+)', t)` – BERA Dec 14 '20 at 17:31

score 61 · Answer 2 · edited Aug 11 '22 at 11:47

61

def mysplit(s):
    head = s.rstrip('0123456789')
    tail = s[len(head):]
    return head, tail

>>> [mysplit(s) for s in ['foofo21', 'bar432', 'foobar12345']]
[('foofo', '21'), ('bar', '432'), ('foobar', '12345')]

edited Aug 11 '22 at 11:47

Giorgos Xou

1,461
1
13
32

answered Jan 10 '09 at 06:17

Mike

1,107
6
7

4

Comparing timing for this answer to the [accepted answer](https://stackoverflow.com/a/430102/3585557), on my machine, using a single example (case study, not representative of all uses), this `str().rstrip()` method was roughly 4x faster. Also, it does not require another import. – Steven C. Howell May 29 '18 at 15:42
It is more pythonic. – cardamom Mar 05 '19 at 14:47
don't know how relevant this is but, when I try FOO_BAR10.34 it gave me 'FOO_BAR10.' and '34' and then when I re-apply mysplits to the first element, it gives me the same thing. I know my issue is slightly different. – Jack Armstrong Jan 13 '20 at 21:17
But I can slice 'FOO_BAR10.' to remove the '.', then re-apply the function to get what I want. +1. – Jack Armstrong Jan 13 '20 at 21:23
To split 'float' at the end add '.' to digits in rstrip() call. – Mike Apr 25 '20 at 19:10
there is `string.digits` – droid192 Nov 06 '20 at 20:41
Nice, no imports needed. Only that this solution assumes the given text is of the format string+numeric. In the case of numeric+string, one would need to resort to lstrip(). But the OP's question already assumes the first case. – RandomWalker Aug 25 '21 at 00:55

score 33 · Answer 3 · answered Jan 10 '09 at 00:54

33

Yet Another Option:

>>> [re.split(r'(\d+)', s) for s in ('foofo21', 'bar432', 'foobar12345')]
[['foofo', '21', ''], ['bar', '432', ''], ['foobar', '12345', '']]

answered Jan 10 '09 at 00:54

jfs

399,953
195
994
1,670

1

Neat. Or even: [re.split(r'(\d+)', s)[0:2] for s in ...] getting rid of that extra empty string. Note though that compared with \w this is equivalent to [^|\d]. – PEZ Jan 10 '09 at 11:32
1

@PEZ: There may be more than one pair and an empty string may be at the begining of the list. You could remove empty strings with `[filter(None, re.split(r'(\d+)', s)) for s in ('foofo21','a1')]` – jfs Jan 10 '09 at 19:47

Federico A. Ramponi · Answer 4 · 2009-01-09T23:19:24.350

29

>>> r = re.compile("([a-zA-Z]+)([0-9]+)")
>>> m = r.match("foobar12345")
>>> m.group(1)
'foobar'
>>> m.group(2)
'12345'

So, if you have a list of strings with that format:

import re
r = re.compile("([a-zA-Z]+)([0-9]+)")
strings = ['foofo21', 'bar432', 'foobar12345']
print [r.match(string).groups() for string in strings]

Output:

[('foofo', '21'), ('bar', '432'), ('foobar', '12345')]

edited Jan 09 '09 at 23:19

answered Jan 09 '09 at 23:12

Federico A. Ramponi

46,145
29
109
133

PEZ · Answer 5 · 2009-01-09T23:49:55.863

11

I'm always the one to bring up findall() =)

>>> strings = ['foofo21', 'bar432', 'foobar12345']
>>> [re.findall(r'(\w+?)(\d+)', s)[0] for s in strings]
[('foofo', '21'), ('bar', '432'), ('foobar', '12345')]

Note that I'm using a simpler (less to type) regex than most of the previous answers.

edited Jan 09 '09 at 23:49

answered Jan 09 '09 at 23:40

PEZ

16,821
7
45
66

r'\w' matches '_'. I don't see '_' in the question. – jfs Jan 10 '09 at 00:52
I don't see A-Z in the question. It says "text and numbers". – PEZ Jan 10 '09 at 09:39
3

@PEZ: If you allow any text except numbers then your regexp should be r'(\D+)(\d+)'. – jfs Jan 10 '09 at 13:33

score 9 · Answer 6 · edited Aug 05 '19 at 13:48

here is a simple function to seperate multiple words and numbers from a string of any length, the re method only seperates first two words and numbers. I think this will help everyone else in the future,

def seperate_string_number(string):
    previous_character = string[0]
    groups = []
    newword = string[0]
    for x, i in enumerate(string[1:]):
        if i.isalpha() and previous_character.isalpha():
            newword += i
        elif i.isnumeric() and previous_character.isnumeric():
            newword += i
        else:
            groups.append(newword)
            newword = i

        previous_character = i

        if x == len(string) - 2:
            groups.append(newword)
            newword = ''
    return groups

print(seperate_string_number('10in20ft10400bg'))
# outputs : ['10', 'in', '20', 'ft', '10400', 'bg']

score 3 · Answer 7 · answered Nov 19 '15 at 19:28

3

import re

s = raw_input()
m = re.match(r"([a-zA-Z]+)([0-9]+)",s)
print m.group(0)
print m.group(1)
print m.group(2)

answered Nov 19 '15 at 19:28

Bug Hunter 219

312
1
14

score 3 · Answer 8 · answered Apr 25 '19 at 06:27

without using regex, using isdigit() built-in function, only works if starting part is text and latter part is number

def text_num_split(item):
    for index, letter in enumerate(item, 0):
        if letter.isdigit():
            return [item[:index],item[index:]]

print(text_num_split("foobar12345"))

OUTPUT :

['foobar', '12345']

score 0 · Answer 9 · edited Jul 28 '21 at 21:58

Here is simple solution for that problem, no need for regex:

user = input('Input: ') # user = 'foobar12345'
int_list, str_list = [], []

for item in user:
 try:
    item = int(item)  # searching for integers in your string
  except:
    str_list.append(item)
    string = ''.join(str_list)
  else:  # if there are integers i will add it to int_list but as str, because join function only can work with str
    int_list.append(str(item))
    integer = int(''.join(int_list))  # if you want it to be string just do z = ''.join(int_list)

final = [string, integer]  # you can also add it to dictionary d = {string: integer}
print(final)

are you sure this is correct `item = int(item) # searching for integers in your string`!!!!??? — Abu Shumon, Jul 28 '21 at 16:39

score 0 · Answer 10 · answered Oct 08 '20 at 12:38

This is a little longer, but more versatile for cases where there are multiple, randomly placed, numbers in the string. Also, it requires no imports.

def getNumbers( input ):
    # Collect Info
    compile = ""
    complete = []

    for letter in input:
        # If compiled string
        if compile:
            # If compiled and letter are same type, append letter
            if compile.isdigit() == letter.isdigit():
                compile += letter
            
            # If compiled and letter are different types, append compiled string, and begin with letter
            else:
                complete.append( compile )
                compile = letter
            
        # If no compiled string, begin with letter
        else:
            compile = letter
        
    # Append leftover compiled string
    if compile:
        complete.append( compile )
    
    # Return numbers only
    numbers = [ word for word in complete if word.isdigit() ]
        
    return numbers

score 0 · Answer 11 · answered May 18 '22 at 18:42

In Addition to the answer of @Evan If the incoming string is in this pattern 21foofo then the re.match pattern would be like this.

import re
match = re.match(r"([0-9]+)([a-z]+)", '21foofo', re.I)
if match:
    items = match.groups()
print(items)
>> ("21", "foofo")

Otherwise, you'll get UnboundLocalError: local variable 'items' referenced before assignment error.

How to split strings into text and number?

11 Answers11

Linked

Related