Cleanest way to obtain the numeric prefix of a string

Question

What is the cleanest way to obtain the numeric prefix of a string in Python?

By "clean" I mean simple, short, readable. I couldn't care less about performance, and I suppose that it is hardly measurable in Python anyway.

For example:

Given the string '123abc456def', what is the cleanest way to obtain the string '123'?

The code below obtains '123456':

input = '123abc456def'
output = ''.join(c for c in input if c in '0123456789')

So I am basically looking for some way to replace the if with a while.

@MaxU: I was hoping that there would be a simple "string operation" that could save me the burden of regular expression, but if you think that there is no other alternative then yes. — barak manos, Mar 08 '16 at 12:09
@AIG: No, it varies (otherwise, I would have just used `input[0:3]`). — barak manos, Mar 08 '16 at 12:12
@ForceBru: Thank you. Here below there is an answer more suitable to my question than the accepted answer in the question that you have suggested as duplicate (i.e., the answer below is "cleaner"), so I will accept it here. — barak manos, Mar 08 '16 at 12:19
I just reopened the question because the suggested duplicates where not a correct duplicate for this question. Note that in this question OP wants a pythonic answer in order to extract the leading numbers while the string is contains another numbers as well. — Mazdak, Mar 08 '16 at 12:57
How do you define "numeric"? `'0'` - `'9'`? Or all numeric unicode codepoints? If you use the lattter, you won't be able to parse the prefix as an integer. — CodesInChaos, Mar 09 '16 at 08:33
@CodesInChaos: Well, I was looking for a way to retrieve the decimal non-negative integer prefix of the string, so basically yes, `'0'` thru `'9'`. — barak manos, Mar 09 '16 at 09:06
@barakmanos In that case all the `str.isDigit` answers are not what you want. I didn't check, but I suspect the `\d` regex answers are wrong as well. — CodesInChaos, Mar 09 '16 at 09:33

score 57 · Accepted Answer · edited Mar 08 '16 at 14:01

57

You can use itertools.takewhile which will iterate over your string (the iterable argument) until it encounters the first item which returns False (by passing to predictor function):

>>> from itertools import takewhile
>>> input = '123abc456def'
>>> ''.join(takewhile(str.isdigit, input))
'123'

edited Mar 08 '16 at 14:01

Matthieu M.

287,565
48
449
722

answered Mar 08 '16 at 12:09

Mazdak

105,000
18
159
188

Please vote to reopen this question. There are two suggested duplicates. One is a different question (!!!), and the other one is closed by itself, and in addition, the answer here matches my question better than the accepted answer to that (closed) question, since it addresses my requirement for the cleanest possible solution. Thank you. – barak manos Mar 08 '16 at 12:54
Thank you very much. Can you please have a look at the solution proposed by @demented hedgehog below? It seems to be very "clean", though I would hate to unaccept your answer. – barak manos Mar 08 '16 at 13:06
1

@barakmanos I left a comment there. It's shorter but is not optimum in terms of memory use. It also used an indexing and two `len` function. – Mazdak Mar 08 '16 at 13:25
If we're going to be like that the question specifically says performance is not important (and not always true as well). – demented hedgehog Mar 08 '16 at 14:08
Just tested you example versus mine and yours takes 3 times longer than mine! (using timeit.default_timer() including the import time for itertools)... and again performance isn't important for this question – demented hedgehog Mar 08 '16 at 14:13
@dementedhedgehog Yes, Your second answer is more optimized in terms of runtime in this case. But first off it doesn't give a general approach and is not a good solution in terms of memory use. – Mazdak Mar 08 '16 at 17:58
1

I'll upvote any solution that looks like Haskell. This is definitely the "cleanest" solution in my opinion. – ApproachingDarknessFish Mar 09 '16 at 00:14

Tal Avissar · Answer 2 · 2016-03-08T12:16:48.373

13

This is the simplest way to extract a list of numbers from a string:

>>> import re
>>> input = '123abc456def'
>>> re.findall('\d+', s)
['123','456']

If you need a list of int's then you might use the following code:

   >>> map(int, re.findall('\d+', input ))
   [123,456]

And now you can access the first element [0] from the above list

edited Mar 08 '16 at 12:16

answered Mar 08 '16 at 12:10

Tal Avissar

10,088
6
45
70

1

That's a bold statement. Though I think it's a good solution. – demented hedgehog Mar 08 '16 at 12:19
10

Perhaps `match()` would be more suitable, since OP needs only the starting digits. – user Mar 08 '16 at 12:22
For `'xyz123abc456def'` `re.findall` will give you the same result, but `'123'` isn't a prefix of that string! You would need `'^\d*'` as regular expression to find a (possibly empty) prefix. – UlFie May 21 '22 at 19:04

score 6 · Answer 3 · answered Mar 08 '16 at 14:18

6

Simpler version (leaving the other answer as there's some interesting debate about which approach is better)

input[:-len(input.lstrip("0123456789"))]

answered Mar 08 '16 at 14:18

demented hedgehog

7,007
4
42
49

This one is pythonic ;), but as I said `takewhile()` is more optimized in terms of memory use. – Mazdak Mar 08 '16 at 17:56
That is true. I'll have to take a second look at itertools.. I hadn't heard about takewhile till now. – demented hedgehog Mar 08 '16 at 21:24

score 4 · Answer 4 · answered Mar 08 '16 at 12:59

4

input[:len(input) - len(input.lstrip("0123456789"))]

answered Mar 08 '16 at 12:59

demented hedgehog

7,007
4
42
49

This is not an optimized approach in terms of memory use (specially when you are dealing with larger strings). Because you are creating a stripped string from the main string and load it in memory. – Mazdak Mar 08 '16 at 13:23
Yes. It's not particularly efficient but the post specifically doesn't care about performance and it is simple and in practice the performance won't usually matter. You'd have to be using big strings to care. There's overhead for example in compiling regular expressions too. If you really care about speed do it in c. – demented hedgehog Mar 08 '16 at 13:30
Also depends on what portion of the string is the prefix.. if it's a million digits followed by "x" your approach is going to be slow too. Possibly slower cause you've got the copy as well as a bunch of function call overhead? I'd be interested to see a timing comparison of our two approaches vs string length and prefix length. – demented hedgehog Mar 08 '16 at 13:38

score 1 · Answer 5 · answered Mar 08 '16 at 12:09

1

Here is my way:

output = input[:next((i for i,v in enumerate(input) if not v.isdigit()),None)]

answered Mar 08 '16 at 12:09

zondo

19,901
8
44
83

MSeifert · Answer 6 · 2016-03-08T12:16:26.600

1

One way, but not very efficient since it works through the whole string without break would be:

input_string = '123abc456def'
[input_string[:c] for c in range(len(input_string)) if input_string[:c].isdigit()][-1]

This appends each substring with increasing size if it is a digit and then appends it. So the last element is the one you look for. Because it is the longest startstring that is still a digit.

edited Mar 08 '16 at 12:16

answered Mar 08 '16 at 12:10

MSeifert

145,886
38
333
352

score 1 · Answer 7 · answered Mar 08 '16 at 12:23

1

You could use regex

import re
initialNumber = re.match(r'(?P<number>\d+)', yourInput).group('number')

answered Mar 08 '16 at 12:23

Mr. E

2,070
11
23

score 1 · Answer 8 · answered Apr 10 '16 at 20:41

1

input = '123abc456def'
output = re.findall(r'^\d+', input)

Will return ['123'] too.

answered Apr 10 '16 at 20:41

Xiflado

176
10

score 0 · Answer 9 · answered Mar 22 '16 at 09:39

0

Another regexp version strips away everything starting with the first non-digit:

import re
output = re.sub('\D.*', '', input)

answered Mar 22 '16 at 09:39

Marius Gedminas

11,010
4
41
39

Cleanest way to obtain the numeric prefix of a string

9 Answers9

Linked

Related