-1

I've run into a regex problem I don't understand. I'm trying to replace a comma in between strings with a semi-colon and it's not working. Here's a sample string below. I set up a regex to replace everything but the content after "sequence" in a regex non-capture group, so that the comma on the end should be replaced with the only substring in the non-capture group, the semi-colon. But, it doesn't work. It only seems to preserve any of the string when I put (?:sequence:) as the non-capture groups. As soon as I add \d, it replaces the entire thing. I'm not sure why.

In my real problem, I have a serious of content tags marked with a colon and end with a semi-colon. In the tag sequence, there's a mistaken comma instead of semi-colon which I need to replace but leave everything unchanged. So, the solution should just change sequence:2, to sequence:2;

a_string = "tag1: content1 is this tag2: 0.1 amount; tag3: july 2020; sequence:2, tag4: content4"
new_string = re.sub(r"(?:sequence\:)(?:\d)(\,)", ";", a_string)

new_string

I looked at other solutions that should work, but don't for this. Any help is appreciated and please let me know if I can make this question any more clear.

tom
  • 977
  • 3
  • 14
  • 30

1 Answers1

1

You probably intended to use a positive negative lookbehind here:

a_string = "tag1: content1 is this tag2: 0.1 amount; tag3: july 2020; sequence:2, tag4: content4"
new_string = re.sub(r"(?<=\bsequence:\d)(\,)", ";", a_string)

print(new_string)

This prints:

tag1: content1 is this tag2: 0.1 amount; tag3: july 2020; sequence:2; tag4: content4

By the way, if you want to match the sequence text before the target comma directly, that's fine, but then replace it as well using a capture group:

a_string = "tag1: content1 is this tag2: 0.1 amount; tag3: july 2020; sequence:2, tag4: content4"
new_string = re.sub(r"(sequence:\d)(\,)", "\\1;", a_string)
print(new_string)   # same as above
Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • Thanks for such a quick response! Can you help me understand why my solution didn't work? And, why this negative lookbehind and \b do? Still getting my head around regex and you seem to understand it very well. – tom Nov 26 '20 at 23:20
  • Your current approach matches _and_ consumes `sequence:\d`. This means you remove this text during the replacement, but you only replace with a single comma. My second version fixed this by capturing the sequence text. The first version using a lookbehind, which matches but does _not_ consume the text. – Tim Biegeleisen Nov 26 '20 at 23:22