0

Here is the sample text:

initiated to address the deviation to SOP-020583v11.0 Section SOP-016248v2.0 john doe, john doe SOP-020583 fake text, this is all fake

Ideally, the text would look like:

initiated to address the deviation to 020583 Section 016248 john doe, john doe 020583 fake text, this is all fake

Here is the code I have so far:

def dashrepl(matchobj):
    print (type(matchobj))
    return re.findall('[0-9]',matchobj)

re.sub(SOP, dashrepl, long_desc_text[22])

But I'm getting the following error:

TypeError: expected string or buffer

Edit updated content:

long_desc_text[22]

SOP-020583v11.0 Section 8.4.On 17Jan2016 at ATO Site, SOP-016248v2.0 was due for periodic review but the periodic SOP-016248 revision is not tied to any change control records. SOP-020583 tied to a change control record" and notified ID63718 notifiedID22359 of the event. SOP-020583v11.0, fake text fake text

madsthaks
  • 2,091
  • 6
  • 25
  • 46

1 Answers1

1

So, here's my code:

import re

test = "initiated to address the deviation to SOP-020583v11.0 Section SOP-016248v2.0 john doe, john doe SOP-020583 fake text, this is all fake"

regexp = r"SOP-(\d+)(?:v\d+\.\d)?"

test = re.subn(regexp, r"\1", test)

print test[1]

It produces:
"initiated to address the deviation to 020583 Section 016248 john doe, john doe 020583 fake text, this is all fake"

Using the python re function "subn" which finds and replaces all examples of a pattern with the specified string - in this case the first capture group. The "r" in front of the string designates it as a regex object.

For reference I also found this link

Hope this helps.

Chromane
  • 175
  • 1
  • 8
  • So I know this isn't what was initially asked but what would I do I wanted to encode the number in a character format. For example, a = 0, b = 1, etc – madsthaks Dec 13 '17 at 05:46