Why does Python Regex complain with my substitution?

Question

I am trying to replace a group of strings in a .csv file that have an extra "," comma and want to delete the extra comma. TabError: inconsistent use of tabs and spaces in indentation

import sys
import re

with open(sys.argv[1], "r", 500000000) as file_in:
    contents = file_in.readlines()

with open("output.csv", "w") as file_out:
    for line in contents:
        line = re.sub(r"tx->{[0-9a-fA-F]{3},",tx->{[0-9a-fA-F]{3}, line)

My data looks like this:

mps,tx->{67f, 40 34 30 00 00 00 00 00} rx<-{5f3, 4b 34 30 00 88 00 00 00},<S t='n' c='GELE' s='7'/>{Hea

Would like my data to look like:

mps,tx->{67f 40 34 30 00 00 00 00 00} rx<-{5f3, 4b 34 30 00 88 00 00 00},<S t='n' c='GELE' s='7'/>{Hea

score 0 · Accepted Answer · answered Mar 21 '20 at 01:37

You need to capture what you want to keep in a capture group and then replace the match with that:

re.sub(r'(tx->{[0-9a-fA-F]{3}),', r'\1', line)

Alternatively, use a positive lookbehind to match the start of the string and then just replace the , with nothing:

re.sub(r'(?<=tx->{[0-9a-fA-F]{3}),', '', line)

If you also want to replace the , in the rx text, replace tx-> in the regexes above with (tx->|rx<-)

Why does Python Regex complain with my substitution?

1 Answers1