3

I have a collection of strings like:

"0"
"90/100"
None
"1-5%/34B-1"
"-13/7"

I would like to convert these into integers (or None) so that I start picking numbers from the beginning and stop at the first non-number character. The above data would thus become:

0
90
None
1
None

I tried doing something like the code below, but ran into multiple problems, like causing ValueError with that int(new_n) line when new_n was just empty string. And even without that, the code just looks horrible:

def pick_right_numbers(old_n):
    new_n = ''
    numbers = {'0', '1', '2', '3', '4', '5', '6', '7', '8', '9'}
    if old_n is None:
        return None
    else:
        for n in old_n:
            if n in numbers:
                new_n += n
            else:
                return int(new_n)
        if new_n:
            return int(new_n)
        else:
            return None

Could someone nudge me to the right direction with this?

jonrsharpe
  • 115,751
  • 26
  • 228
  • 437
  • The code doesn't look too bad. I'd replace the `else:return int(new_n)` with `else: break`. – Jasper Jul 15 '16 at 07:53
  • 2
    `from itertools import takewhile; text = ''.join(takewhile(str.isdigit, input or "")); result = int(text) if text else None`? – Bakuriu Jul 15 '16 at 07:55

4 Answers4

1

Is this the sort of thing you're looking for ?

import re
data = ['0', '90/100', None, '1-5%/34B-1', '-13/7']

def pick_right_numbers(old_n):
    if old_n is None:
        return None
    else:
        digits = re.match("([0-9]*)",old_n).groups()[0]
        if digits.isdigit():
            return int(digits)
        else:
            return None

for string in data:
    result = pick_right_numbers(string)
    if result is not None:
        print("Matched section is : {0:d}".format(result))

It uses re (pattern matching) to detect a block of digits at the start of a string (match only matches the beginning of a string, search would find a block anywhere in the string). It checks for a match, confirms the match is digits (otherwise the last data element matches, but is the empty string) and converts that to an integer to return.

R.Sharp
  • 296
  • 1
  • 8
0
>>> import re
>>> s = ["0", "90/100", None, "1-5%/34B-1", "-13/7"]
>>> [int(c) if c else None for c in (re.sub('([0-9]*).*', r'\1', str(x)) for x in s)]
[0, 90, None, 1, None]

How it works

We have two list comprehensions. The inner removes everything from the elements of list s except the initial numbers:

>>> list(re.sub('([0-9]*).*', r'\1', str(x)) for x in s)
['0', '90', '', '1', '']

The outer list comprehension converts those strings, if nonempty, to integers or otherwise to None:

>>> [int(c) if c else None for c in ('0', '90', '', '1', '')]
[0, 90, None, 1, None]

Alternative: using takewhile

As per Bakuriu's comment, we can use intertools.takewhile in place of re.sub:

>>> from itertools import takewhile
>>> [int(c) if len(c) else None for c in (''.join(takewhile(str.isdigit, x or "")) for x in s)]
[0, 90, None, 1, None]

Modifications to original code

Alternatively, we can modify the original code:

def pick_right_numbers(old_n):
    if old_n is None:
        return None
    else:
        new_n = ''
        for n in old_n:
            if not n.isdigit():
                break
            new_n += n 
        return int(new_n) if len(new_n) else None

This code produces the output:

>>> [pick_right_numbers(x) for x in s]
[0, 90, None, 1, None]
John1024
  • 109,961
  • 14
  • 137
  • 171
0

a basic way to do this, would be:

input_list = ["0", "90/100", None,  "1-5%/34B-1", "-13/7"]
char_list = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']
output_list = []

for input_str in input_list:

    if isinstance(input_str, str):
        i = 0
        for input_char in input_str:
            if input_char in char_list:
                i += 1
            else:
                break
    else:
        i = 0

    if i:
        output = int(input_str[0:i])
    else:
        output = None

    output_list.append(output)

but there are quite a few variants. if it's a function that you would repeat 10.000+ times per day, some performance profiling would be smart to consider alternatives.

edit: it might be smart to consider what a string is in python 2 vs 3 (see What is the difference between isinstance('aaa', basestring) and isinstance('aaa', str)?)

edit2: see how Bakuriu's solutions simplifies this ->

from itertools import takewhile
input_list = ["0", "90/100", None,  "1-5%/34B-1", "-13/7"]
output_list = []
for input_str in input_list:
    text = ''.join(takewhile(str.isdigit, input_str or ""))        
    output_list.append(int(text) if text else None)

(so i think he should add that as the best answer to be honest ;)

Community
  • 1
  • 1
Carst
  • 1,614
  • 3
  • 17
  • 28
  • just seeing Bakuriu's comment, I think that would be the most pythonic answer and probably the fastest as it uses itertools – Carst Jul 15 '16 at 08:08
0

There are various methods to check if an object is a number. See for instance this answer.

However you only need to check one char at a time, so your method is actually fine. The array will be permanently in cache, so it will be scanned fast.

Note that you can just write it in a nicer way:

if n in "0123456789":

Another possibility, probably the fastest, is checking the range, treating them as numerical values via ASCII representation (using the fact that digits are contiguous in that representation, and are in the order you expect):

zero = ord('0')
nine = ord('9')
for n in old_n:
   nn = ord(n) 
   if (nn >= zero) and (nn <= nine):

The most elegant way, of course, would to call the native isdigit() on it; you save on all clutter and make your intent completely clear. Note that it might be more than you ask for - is a digit according to Unicode. But you're unlikely to encounter such cases. Also note that due to this, it will likely be slower than your implementation.

Note that you need to check for new_n == '' also inside the loop! The best way to not repeat yourself is to fall out of the loop to the final if

def pick_right_numbers(old_n):
    new_n = ''
    if old_n is None:
        return None
    else:
        for n in old_n:
            if n.isdigit():
                new_n += n
            else:
                break
        if new_n:
            return int(new_n)
        else:
            return None

Of course if you need speed you will probably have to change the approach, as you are growing a vector in a loop. But if this is the logic making sense to you, only complicate it if this is the bottleneck of the program.

Community
  • 1
  • 1
Francesco Dondi
  • 1,064
  • 9
  • 17