I'm running into an (<class 're.error'>, error('bad escape \S at position 51'), <traceback object at 0x00000230E5F63580>) when trying to use re.sub
My code is probably a mess, because I started to learn Python a week a go. It runs through certain (xml) files in a given source folder, copy a part of the file based on a regex, and paste it into a file with the same name in a given target folder, replacing the existing part based on the same regex. It works fine, but as soon as the replacement string contains \S (which can happen, because it is a path), the replacement fails.
Here is my code (sorry for the mess):
import re
import os
from tkinter import messagebox
from tkinter import simpledialog
source_input = simpledialog.askstring(title="Quelldateien", prompt="Pfad zu den Quelldateien:\t\t\t\t\t\t\t\t\t")
target_input = simpledialog.askstring(title="Zieldateien", prompt="Pfad zu den Zieldateien:\t\t\t\t\t\t\t\t\t")
search_pattern = re.compile("<reference>.*?</reference>", re.DOTALL)
for path, subdirs, files in os.walk(source_input):
for filename in files:
if filename.endswith(".sdlxliff"):
source_file = open(path + os.sep + filename, 'r', encoding="utf8")
source_content = source_file.read()
source_file.close()
source_reference = re.search(search_pattern, source_content)
source_reference_string = source_reference.group(0)
target_path = path.replace(source_input, target_input)
if os.path.exists(target_path + os.sep + filename):
target_file = open(target_path + os.sep + filename, 'r', encoding="utf8")
target_content = target_file.read()
target_file.close()
newdata = re.sub(search_pattern, source_reference_string, target_content)
target_file = open(target_path + os.sep + filename, 'w', encoding="utf8")
target_file.write(newdata)
target_file.close()
messagebox.showinfo(title="Erledigt", message="Der Referenzteil wurde ersetzt.")
The replacement string in re.sub (source_reference_string variable) looks like this:
<reference><external-file href="file://C:\\_Projekte\\S$$$\\220909_error_ZV\\en-US\\$$$$ - Kopie.pptx" uid="Pptx.DependencyFileId"/></reference>
I found this thread and tried to replace re with regex, but I ran into the same error: Python 3.7.4: 're.error: bad escape \s at position 0'
I would like re.sub to just take the replacement string without interpreting any backslashes.
Thanks for any help.