26

What is the cleanest way to obtain the numeric prefix of a string in Python?

By "clean" I mean simple, short, readable. I couldn't care less about performance, and I suppose that it is hardly measurable in Python anyway.

For example:

Given the string '123abc456def', what is the cleanest way to obtain the string '123'?

The code below obtains '123456':

input = '123abc456def'
output = ''.join(c for c in input if c in '0123456789')

So I am basically looking for some way to replace the if with a while.

Mazdak
  • 105,000
  • 18
  • 159
  • 188
barak manos
  • 29,648
  • 10
  • 62
  • 114
  • 5
    would regexp be an option? – MaxU - stand with Ukraine Mar 08 '16 at 12:06
  • @MaxU: I was hoping that there would be a simple "string operation" that could save me the burden of regular expression, but if you think that there is no other alternative then yes. – barak manos Mar 08 '16 at 12:09
  • Are all of the prefixes 3 characters or does it vary? – AlG Mar 08 '16 at 12:11
  • @AIG: No, it varies (otherwise, I would have just used `input[0:3]`). – barak manos Mar 08 '16 at 12:12
  • @ForceBru: Thank you. Here below there is an answer more suitable to my question than the accepted answer in the question that you have suggested as duplicate (i.e., the answer below is "cleaner"), so I will accept it here. – barak manos Mar 08 '16 at 12:19
  • I just reopened the question because the suggested duplicates where not a correct duplicate for this question. Note that in this question OP wants a pythonic answer in order to extract the leading numbers while the string is contains another numbers as well. – Mazdak Mar 08 '16 at 12:57
  • How do you define "numeric"? `'0'` - `'9'`? Or all numeric unicode codepoints? If you use the lattter, you won't be able to parse the prefix as an integer. – CodesInChaos Mar 09 '16 at 08:33
  • @CodesInChaos: Well, I was looking for a way to retrieve the decimal non-negative integer prefix of the string, so basically yes, `'0'` thru `'9'`. – barak manos Mar 09 '16 at 09:06
  • @barakmanos In that case all the `str.isDigit` answers are not what you want. I didn't check, but I suspect the `\d` regex answers are wrong as well. – CodesInChaos Mar 09 '16 at 09:33
  • @CodesInChaos: Can you provide an example please? – barak manos Mar 09 '16 at 09:44

9 Answers9

57

You can use itertools.takewhile which will iterate over your string (the iterable argument) until it encounters the first item which returns False (by passing to predictor function):

>>> from itertools import takewhile
>>> input = '123abc456def'
>>> ''.join(takewhile(str.isdigit, input))
'123'
Matthieu M.
  • 287,565
  • 48
  • 449
  • 722
Mazdak
  • 105,000
  • 18
  • 159
  • 188
  • Please vote to reopen this question. There are two suggested duplicates. One is a different question (!!!), and the other one is closed by itself, and in addition, the answer here matches my question better than the accepted answer to that (closed) question, since it addresses my requirement for the cleanest possible solution. Thank you. – barak manos Mar 08 '16 at 12:54
  • Thank you very much. Can you please have a look at the solution proposed by @demented hedgehog below? It seems to be very "clean", though I would hate to unaccept your answer. – barak manos Mar 08 '16 at 13:06
  • 1
    @barakmanos I left a comment there. It's shorter but is not optimum in terms of memory use. It also used an indexing and two `len` function. – Mazdak Mar 08 '16 at 13:25
  • If we're going to be like that the question specifically says performance is not important (and not always true as well). – demented hedgehog Mar 08 '16 at 14:08
  • Just tested you example versus mine and yours takes 3 times longer than mine! (using timeit.default_timer() including the import time for itertools)... and again performance isn't important for this question – demented hedgehog Mar 08 '16 at 14:13
  • @dementedhedgehog Yes, Your second answer is more optimized in terms of runtime in this case. But first off it doesn't give a general approach and is not a good solution in terms of memory use. – Mazdak Mar 08 '16 at 17:58
  • 1
    I'll upvote any solution that looks like Haskell. This is definitely the "cleanest" solution in my opinion. – ApproachingDarknessFish Mar 09 '16 at 00:14
13

This is the simplest way to extract a list of numbers from a string:

>>> import re
>>> input = '123abc456def'
>>> re.findall('\d+', s)
['123','456']

If you need a list of int's then you might use the following code:

   >>> map(int, re.findall('\d+', input ))
   [123,456]

And now you can access the first element [0] from the above list

Tal Avissar
  • 10,088
  • 6
  • 45
  • 70
  • 1
    That's a bold statement. Though I think it's a good solution. – demented hedgehog Mar 08 '16 at 12:19
  • 10
    Perhaps `match()` would be more suitable, since OP needs only the starting digits. – user Mar 08 '16 at 12:22
  • For `'xyz123abc456def'` `re.findall` will give you the same result, but `'123'` isn't a prefix of that string! You would need `'^\d*'` as regular expression to find a (possibly empty) prefix. – UlFie May 21 '22 at 19:04
6

Simpler version (leaving the other answer as there's some interesting debate about which approach is better)

input[:-len(input.lstrip("0123456789"))]
demented hedgehog
  • 7,007
  • 4
  • 42
  • 49
4
input[:len(input) - len(input.lstrip("0123456789"))]
demented hedgehog
  • 7,007
  • 4
  • 42
  • 49
  • This is not an optimized approach in terms of memory use (specially when you are dealing with larger strings). Because you are creating a stripped string from the main string and load it in memory. – Mazdak Mar 08 '16 at 13:23
  • Yes. It's not particularly efficient but the post specifically doesn't care about performance and it is simple and in practice the performance won't usually matter. You'd have to be using big strings to care. There's overhead for example in compiling regular expressions too. If you really care about speed do it in c. – demented hedgehog Mar 08 '16 at 13:30
  • Also depends on what portion of the string is the prefix.. if it's a million digits followed by "x" your approach is going to be slow too. Possibly slower cause you've got the copy as well as a bunch of function call overhead? I'd be interested to see a timing comparison of our two approaches vs string length and prefix length. – demented hedgehog Mar 08 '16 at 13:38
1

Here is my way:

output = input[:next((i for i,v in enumerate(input) if not v.isdigit()),None)]
zondo
  • 19,901
  • 8
  • 44
  • 83
1

One way, but not very efficient since it works through the whole string without break would be:

input_string = '123abc456def'
[input_string[:c] for c in range(len(input_string)) if input_string[:c].isdigit()][-1]

This appends each substring with increasing size if it is a digit and then appends it. So the last element is the one you look for. Because it is the longest startstring that is still a digit.

MSeifert
  • 145,886
  • 38
  • 333
  • 352
1

You could use regex

import re
initialNumber = re.match(r'(?P<number>\d+)', yourInput).group('number')
Mr. E
  • 2,070
  • 11
  • 23
1
input = '123abc456def'
output = re.findall(r'^\d+', input)

Will return ['123'] too.

Xiflado
  • 176
  • 10
0

Another regexp version strips away everything starting with the first non-digit:

import re
output = re.sub('\D.*', '', input)
Marius Gedminas
  • 11,010
  • 4
  • 41
  • 39