3

I need to detect last digits in the string, as they are indexes for my strings. They may be 2^64, So it's not convenient to check only last element in the string, then try second... etc. String may be like asdgaf1_hsg534, i.e. in the string may be other digits too, but there are somewhere in the middle and they are not neighboring with the index I want to get.

Maroun
  • 94,125
  • 30
  • 188
  • 241
danny
  • 65
  • 1
  • 8

3 Answers3

7

Here is a method using re.sub:

import re

input = ['asdgaf1_hsg534', 'asdfh23_hsjd12', 'dgshg_jhfsd86']

for s in input:
    print re.sub('.*?([0-9]*)$',r'\1',s)

Output:

534
12
86

Explanation:

The function takes a regular expression, a replacement string, and the string you want to do the replacement on: re.sub(regex,replace,string)

The regex '.*?([0-9]*)$' matches the whole string and captures the number that precedes the end of the string. Parenthesis are used to capture parts of the match we are interested in, \1 refers to the first capture group and \2 the second ect..

.*?      # Matches anything (non-greedy) 
([0-9]*) # Upto a zero or more digits digit (captured)
$        # Followed by the end-of-string identifier 

So we are replacing the whole string with just the captured number we are interested in. In python we need to use raw strings for this: r'\1'. If the string doesn't end with digits then a blank string with be returned.


twosixfour = "get_the_numb3r_2_^_64__18446744073709551615"

print re.sub('.*?([0-9]*)$',r'\1',twosixfour)

>>> 18446744073709551615
Chris Seymour
  • 83,387
  • 30
  • 160
  • 202
2

A simple regex can detect digits at the end of the string:

'\d+$'

$ matches the end of the string. \d+ matches one or more digits. The + operator is greedy by default, meaning it matches as many digits as possible. So this will match all of the digits at the end of the string.

dan1111
  • 6,576
  • 2
  • 18
  • 29
  • Thanks for your help. I did it: is_match = re.match(r'(.*)(\D)(\d+)', myString) if is_match: print is_match.group(3) It works – danny Nov 20 '12 at 13:03
0

If you want to use re.sub and make sure that there is at least a single digit present at the end of the line, you can use the quantifier + to match 1 or more digits \d+ to not remove the whole line if there are no digits present or no digits only at the end of the line.

^.*?(\d+)$
  • ^ Start of line
  • .*? Match any char except a newline as least as possible (non greedy)
  • (\d+) Capture group 1, match 1+ digits
  • $ End of line

Or using a negative lookbehind

^.*(?<!\d)(\d+)$
  • ^ Start of line
  • .* Match any char except a newline as much as possible
  • (?<!\d)(\d+) Assert no digits directly to the left, then capture 1+ digits in group 1
  • $ End of line

Regex demo

When using re.match, you can omit the ^ anchor and you might also use \A and \Z to asert the start and the end of the string.

Regex demo

import re

strings = ['asdgaf1_hsg534', 'asdfh23_hsjd12', 'dgshg_jhfsd86', 'test']

for s in strings:
    print (re.sub(r".*?(\d+)$", r'\1',s))

Output

534
12
86
test

If there should be a non digit present before matching a digit as in this comment you could use a negated character class with a single capture group.

^.*[^\d\r\n](\d+)
  • ^ Start of line
  • .* Match any char except a newline as much as possible
  • [^\d\r\n] Negated character class, match any char except a digit or a newline
  • (\d+) Capture group 1, match 1+ digits

Regex demo


To get the last digits in the string (not necessarily at the end of the string)

^.*?(\d+)[^\r\n\d]*$
  • ^ Start of line
  • .*? Match any char except a newline as least as possible (non greedy)
  • (\d+) Capture group 1, match 1+ digits
  • [^\r\n\d]* Negated character class, match 0+ times any char except a newline or digit
  • $ End of line

Regex demo

The fourth bird
  • 154,723
  • 16
  • 55
  • 70