i have two words:
word_1 = 'وتعالی'
word_2 = ':'
the words can be anything mostly unicode and punctuation
and i want to find them inside a long multiline string like this:
string = <Hadith><Document> قال:</Document> و سأله<Person> الحسین بن أسباط</Person> - و أنا أسمع - عن الذبیح <Prophet> إسماعیل</Prophet> أو<Prophet> إسحاق</Prophet> ؟ فقال:«<Prophet> إسماعیل</Prophet> ، أما سمعت قول الله تبارک و تعالی:«<Ayah WordIndex="۱-۲" Soreh="۳۷" Name="الصافات" Ayah="۱۱۲" AyahList="۱۱۲"> و بشرناه </Ayah> » «<Ayah WordIndex="۳-۳" Soreh="۳۷" Name="الصافات" Ayah="۱۱۲" AyahList="۱۱۲"><Prophet> بإسحاق</Prophet> </Ayah> » .</Hadith>
فقال : « إسماعیل ، أما سمعت قول الله
تبارک وتعالی :
it means the <xml>وتعالی</xml> :
should match my regex but وتعالی الی صتیدشنتد :
shouldnt match
if the arabic characters are annying let me take an exmaple by english letters:
words:
word_1 = "hello"
word_2 = "there"
this should match hello<xml>: there</xml>:
but this one hello<xml>guys there</xml>
shouldnt
as you see i want to find the index of the end of two words by this pattern:
(as many spaces as it can be or nothing, any punctuations or nothing , any xml tag or nothing {word_1} any xml tag or nothing, any punctuations or nothing {word_2} as many spaces as it can be or nothing, any punctuations or nothing , any xml tag or nothing )
i tried so many patterns provided by chat GPT but none of theme worked for me:
these are the patterns :
fr'\s*[^\w]*<.*?>{re.escape(last_words[0])}<.*?>{re.escape(last_words[1])}[^\w]*<.*?>\s*'
fr'<.*?>{re.escape(word2)}<.*?>{re.escape(word1)}<.*?>'
and none of them worked
i want to find the index of the last two words of the string and then add somthing that i wnat at there for example i want the index of ":" in the string becuase my two words are و تعالی and the ":" and i want to add a xml tag right after ":"
i dont want to remove any xml tag or punctuation from the text (i mean if they can be removed then took them back it's ok)
sorry for bad english in advance
any help will be appriciated!