I need your help with regular expression. I have xml text like this:
<w><ana lex="совершенно" gr="ADV"></ana>соверш`енно</w>
and I need to extract совершенно, ADV and соверш`енно. I have tried, but I know regular expressions not so good.
I need your help with regular expression. I have xml text like this:
<w><ana lex="совершенно" gr="ADV"></ana>соверш`енно</w>
and I need to extract совершенно, ADV and соверш`енно. I have tried, but I know regular expressions not so good.
Better use BeautifulSoup
instead of regular expressions in your case.
>>> import BeautifulSoup as bs
>>> xml = '<w><ana lex="совершенно" gr="ADV"></ana>соверш`енно</w>'
>>> soup = bs.BeautifulSoup(xml)
>>> print(soup.find('ana', {'lex':unicode}).get('lex'))
совершенно
following is the method from python regular expression model which will return position of data which you want to find in your answer.
import re
data=re.search("соверш`енно","<w><ana lex="совершенно" gr="ADV">
</ana>соверш`енно</w>")
re.search() function returns position of your string in text and extract other strings also like that.