Extracting Numbers from a String Without Regular Expressions

Question

I am trying to extract all the numbers from a string composed of digits, symbols and letters. If the numbers are multi-digit, I have to extract them as multidigit (e.g. from "shsgd89shs2011%%5swts"), I have to pull the numbers out as they appear (89, 2011 and 5). So far what I have done just loops through and returns all the numbers incrementally, which I like but I cannot figure out how to make it stop after finishing with one set of digits:

    def StringThings(strng):
         nums = []
         number = ""
         for each in range(len(strng)):
             if strng[each].isdigit():
                 number += strng[each]
             else:
                 continue
            nums.append(number)

        return nums

Running this value: "6wtwyw66hgsgs" returns ['6', '66', '666'] w hat simple way is there of breaking out of the loop once I have gotten what I needed?

You could try use the answers to a similar question in [python extract numbers from a string][1] [1]: http://stackoverflow.com/questions/4289331/python-extract-numbers-from-a-string — TatiAuza, Sep 19 '15 at 23:40

Padraic Cunningham · Answer 1 · 2015-09-20T12:03:52.913

Using your function, just use a temp variable to concat each sequence of digits, yielding the groups each time you encounter a non-digit if the temp variable is not an empty string:

def string_things(strng):
    temp = ""
    for ele in strng:
        if ele.isdigit():
            temp += ele
        elif temp: # if we have a sequence
            yield temp
            temp = "" # reset temp
    if temp: # catch ending sequence
        yield temp

Output

In [9]: s = "shsgd89shs2011%%5swts"
In [10]: list(string_things(s))
Out[10]: ['89', '2011', '5']

In [11]: s ="67gobbledegook95"
In [12]: list(string_things(s))
Out[12]: ['67', '95']

Or you could translate the string replacing letters and punctuation with spaces then split:

from string import ascii_letters, punctuation, maketrans
s = "shsgd89shs2011%%5swts"

replace = ascii_letters+punctuation

tbl = maketrans(replace," " * len(replace))
print(s.translate(tbl).split())
['89', '2011', '5']

Wow. Another great solution. I feel dumber by the minute. First time running into "yield". Thank you. :-) — Unpossible, Sep 21 '15 at 00:51

Oleg Gopkolov · Answer 2 · 2015-09-20T12:21:23.107

1

L2 = []
file_Name1 = 'shsgd89shs2011%%5swts' 
from itertools import groupby
for k,g in groupby(file_Name1, str.isdigit):
    a = list(g)
    if k == 1:
        L2.append("".join(a))

print(L2)

Result ['89', '2011', '5']

edited Sep 20 '15 at 12:21

answered Sep 20 '15 at 12:13

Oleg Gopkolov

1,684
10
17

Thanks! But I am not allowed to use any modules, at all. – Unpossible Sep 21 '15 at 00:50
@Sina `I am not allowed to use any modules` Where you wrote about this??? Please, for the future, write all special conditions in question (first post) in order to people don't waste your time. I was whole hour finding answer for you. – Oleg Gopkolov Sep 21 '15 at 09:39
I'm sorry, Oleg. But I did say in the title no regular expressions. I suppose that was insufficient information. Thanks for your help, it's still very valuable information for me. – Unpossible Sep 21 '15 at 12:33
@Sina Where you see "regular expressions" in my code???? I use "groupby" from "itertools" library. – Oleg Gopkolov Sep 21 '15 at 17:13
@Sina On the contrary, I was finding solution without using of regex library. where you see `import re` in the represented excerpt of code??? – Oleg Gopkolov Sep 21 '15 at 17:37

Eirik Birkeland · Accepted Answer · 2015-09-20T11:15:31.987

Updated to account for trailing numbers:

def StringThings(strng):
    nums = []
    number = ""
    for each in range(len(strng)):
        if strng[each].isdigit():
            number += strng[each]

        if each == len(strng)-1:
            if number != '':
                nums.append(number)

        if each != 0:
            if strng[each].isdigit() == False:
                 if strng[each-1].isdigit():
                     nums.append(number)
                     number = "" 
                     continue;
    return nums

print StringThings("shsgd89shs2011%%5swts34");
// returns ['89', '2011', '5', '34']

So, when we reach a character which is not a number, and if the previously observed character was a number, append the contents of number to nums and then simply empty our temporary container number, to avoid it containing all the old stuff.

Note, I don't know Python so the solution may not be very pythonic.

Alternatively, save yourself all the work and just do:

import re
print re.findall(r'\d+', 'shsgd89shs2011%%5swts');

Thanks, but I am not allowed to use re, and I have tried this before, and it doesn't work in all cases. For instance "67gobbledegook95" returns '67'. — Unpossible, Sep 20 '15 at 02:04

Extracting Numbers from a String Without Regular Expressions

3 Answers3