I am a beginner at Python and met with some coding problem that I can't solve.
What I have:
the source sentences and their respective translations in two columns in a spreadsheet;
the html code which contains sentences and html tags
What I'm trying to do: use Python regex method - sub() to find and replace english sentences to their respective translated sentences.
For example: three sentences in html codes - Pumas are large animals. They are found in America. They don't eat grass
I have the translations of each sentence in the html code. I want to replace the sentences one at a time and also keep the html tags. Normally I can use the sub() method like this:
regex1 = re.compile(r'(\>.*)SOURCE_SENTENCE_HERE ?(.*\<)')
resultCode = regex1.sub(r'\1TRANSLATION_SENTENCE_HERE\2', originalHtmlCode)
I've written a python script to do this. I save the html code in a txt file and access it in my Python code (succeeded). Then I create a dictionary to store the source-target paires in the spreadsheet mentioned above (succeeded). Lastly, I use rexgex sub() method to find and replace the sentences in the html code (failed). This last part didn't work at all for some reason. Link to my Python code - https://pastebin.com/ZSUNB4yg or below:
import re, openpyxl, pyperclip
buynavFile = open('C:\\Users\\zs\\Documents\\PythonScripts\\buynavCode.txt')
buynavCode = buynavFile.read()
buynavFile.close()
wb = openpyxl.load_workbook('buynavSegments.xlsx')
sheet = wb.get_sheet_by_name('Sheet1')
segDict = {}
maxRow = sheet.max_row
for i in range(2, maxRow + 1):
segDict[sheet.cell(row=i, column=3).value] = sheet.cell(row=i, column=4).value
for k, v in segDict.items():
k = '(\\>.*)' + str(k) + ' ?(.*\\<)'
v = '\\1' + str(v) + '\\2'
buynavRegex = re.compile(k)
buynavResult = buynavRegex.sub(v, buynavCode)
pyperclip.copy(buynavResult)
print('Result copied to clipboard')
Error message below:
Traceback (most recent call last):
File "C:\Users\zs\Documents\PythonScripts\buynav.py", line 20, in
buynavResult = buynavRegex.sub(v, buynavCode)
File "C:\Users\zs\AppData\Local\Programs\Python\Python36\lib\re.py", line 326, in _subx
template = _compile_repl(template, pattern)
File "C:\Users\zs\AppData\Local\Programs\Python\Python36\lib\re.py", line 317, in _compile_repl
return sre_parse.parse_template(repl, pattern)
File "C:\Users\zs\AppData\Local\Programs\Python\Python36\lib\sre_parse.py", line 943, in parse_template
addgroup(int(this[1:]), len(this) - 1)
File "C:\Users\zs\AppData\Local\Programs\Python\Python36\lib\sre_parse.py", line 887, in addgroup
raise s.error("invalid group reference %d" % index, pos)
sre_constants.error: invalid group reference 11 at position 1
Could someone enlighten me on this please? I would really appreciate it.