1

I have a string in the format "12345-0012-0123" and I would like to change them all to be in the format of 12345-0012-123" so that the last section after the dash is only three digits instead of four digit.

In all cases the last section after the dash will only have at most three real digits that I need to keep with a zero in front 0001, 0012, 0123...

Some strings that I will be editing are already in the correct format so a quick check to see if iI even need to perform correction would be better...

EDIT: Solved... !!

For any one interested this is the arc gis calculator code I am using that was modified from the answer provided by anirudh...

#Convert to three digit count def FixCount(s): length = len(s[s.rfind('-')+1:]) if length > 3: return s.rstrip(s[s.rfind('-')+1:])+s[s.rfind('-')+2:] else: return s.rstrip(s[s.rfind('-')+1:])+s[s.rfind('-')+1:] __esri_field_calculator_splitter__ FixCount(str( !input_field_id! ))

Moon47
  • 103
  • 2
  • 11
  • When you say you need only 3 digits instead of 4 do you mean 0001 becomes 001 or 0001 becomes 000? Without actually seeing your code I'll just treat the 12345-0012-0123 as an input... – Amazingred Apr 17 '14 at 04:13
  • I want 0001 to become 001... The input will be a field entry in a database... I will use the code in an ArcMap calculation that will repeat the code for every enry in that field or column of data... We use this to change all the first part of the string to to 5 digit format... `('0000'+!String_ID!)[-14:]` if the last part is formatted in the correct 3 digit format already... – Moon47 Apr 17 '14 at 04:21

5 Answers5

4

This is a job for regular expressions!

Given:

>>> s
'12345-0012-0123'

We want to match two three groups:

  • one or more (+) digits (d) followed by a - followed by one or more (+) digits (d) followed by a -
  • then there is one or more (+) 0, which we don't capture (no ()s). Remove the + if you only want to match a single 0!
  • one or more (+) digits (d)

Then we want to substitute (re.sub()) our string s where it matches this regular expression with what is in those capture groups.

>>> re.sub('(\d+-\d+-)0+(\d+)', r'\1\2', s)
'12345-0012-123'

N.B:

re.sub() returns the modified s, it does not modify it in-place.

Community
  • 1
  • 1
johnsyweb
  • 136,902
  • 23
  • 188
  • 247
  • 1
    Oh wow, that's how you match groups in the resulting strings?! I naively thought I could only do it in sublime text... – Mdev Apr 17 '14 at 04:19
  • running `re.sub('(\d+-\d+-)0+(\d+)', r'\1\2', '12345-0012-0023')` gives `'12345-0012-23'` not exactly what the author wants... – James Lin Apr 17 '14 at 04:23
  • I am struggling to understand how `r'\1\2'` is used here, care to explain? – James Lin Apr 17 '14 at 04:26
  • @JamesLin: I was adding "Remove the + if you only want to match a single 0!", as you were commenting :-) – johnsyweb Apr 17 '14 at 04:26
  • @JamesLin: `r'\1\2'` is explained in more detail in http://stackoverflow.com/q/8157267/78845 , which I linked as "capture groups" and also in the `re.sub()` documentation to which I linked (they're back references to the first and second capture groups). – johnsyweb Apr 17 '14 at 04:29
  • I'd still like to understand the down-vote, so that I can improve my answer, whoever left it :-) – johnsyweb Apr 17 '14 at 04:30
  • Thanks that seems to make sense, thanks for the description of all that parts too... =] – Moon47 Apr 17 '14 at 04:30
4

This is not necessarily a job for regular expressions!

def reformat(a):
    x = a.split("-")
    x[-1] = "%03d"%int(x[-1])
    return "-".join(x)

example use:

In [14]: reformat("12345-0012-0001")
Out[14]: '12345-0012-001'

So taking some other answers here:

In [55]: %timeit v[:len(v)-4]+str(int(v.split('-')[2]))
100000 loops, best of 3: 1.83 us per loop

In [56]: %timeit reformat(v)
100000 loops, best of 3: 1.99 us per loop

In [57]: %timeit re.sub('(\d+-\d+-)0+(\d+)', r'\1\2', x)
100000 loops, best of 3: 9.53 us per loop

Regular expressions are overkill here and are slow compared to just using the builtins.

ebarr
  • 7,704
  • 1
  • 29
  • 40
  • I'm not sure that regular expressions are *overkill* here, if the people maintaining the code are familiar with them, they succinctly describe the transformation. If the people maintaining the code are unfamiliar with regular expressions, or speed is of the essence, a solution like yours may be better. How does this deal with "12345-0012-0001"? ;) – johnsyweb Apr 17 '14 at 04:58
  • @Johnsyweb "after the dash will only have **at most** three real digits that i need to keep". The above code will output `12345-0012-1` for an input of `12345-0012-0001` which appears to be what the OP wants. – ebarr Apr 17 '14 at 05:55
  • "I want 0001 to become 001...", says http://stackoverflow.com/q/23124454/78845#comment35352931_23124454 – johnsyweb Apr 17 '14 at 06:07
  • 1
    @Johnsyweb apologies, I missed that. I've updated for the clarified problem description. – ebarr Apr 17 '14 at 06:10
2

You can use regex as said by johnsyweb, or you can also use the below if you don't want to use regex.

s = "12345-0012-0123"
length = len(s[s.rfind('-')+1:])
if length > 3:
    print s.rstrip(s[s.rfind('-')+1:])+s[s.rfind('-')+2:]
else:
    print s.rstrip(s[s.rfind('-')+1:])+s[s.rfind('-')+1:]
anirudh
  • 4,116
  • 2
  • 20
  • 35
  • That makes more sense to me than the regex but i will give both a try... Thank you i was not really sure where to start or how to format the find... =] – Moon47 Apr 17 '14 at 04:28
  • This seems to work the best with the arc gis calculator that I am using it for... – Moon47 Apr 17 '14 at 22:45
0

where v is the string varaible:

ONE LINER:

v=v[:len(v)-4]+str(int(v.split('-')[2]))
Amazingred
  • 1,007
  • 6
  • 14
-2

If the format is fixed. Simply do this.

s[:11] + s[12:]

Edit:

This is more reliable version.

s[:11] + str(int[11:]))
Kei Minagawa
  • 4,395
  • 3
  • 25
  • 43
  • The last paragraph states "Some strings that i will be editing are already in the correct format so a quick check to see if i even need to perform correction would be better..." – johnsyweb Apr 17 '14 at 04:32