I have lots of xml files that have keys that are in digit format i.e <12345>Golly</12345>
When parsing using ElementTree
I get an error not well-formed (invalid token)
. I am assuming this because the keys are in digit format and not words. When I try to change/replace the keys into string by adding double quotes using regex
xmlstr = re.sub('<([\d]+)>','<"' + str(re.search('<([\d]+)>', xmlstr).group(1))+ '">',xmlstr)
xmlstr = re.sub('</([\d]+)>','</"' + str(re.search('</([\d]+)>', xmlstr).group(1))+ '">',xmlstr)
All other keys are replace using the first found key.(all keys end up being the same. whereas the keys themselves in the original file are unique in each document.) I guess the files were converted from json to xml directly. The keys should represent id number and the values are the names associated with the id number
I was wondering if there is a way to work with digits as keys, or if there is a way I can replace the keys one by one and not replacing all matches with one found string.
.group(1)
returns the first occurrence which causes the problem.
Please Help.