First, to answer your question directly:
I do it by examining each 'word' consisting of a sequence of characters containing (mainly) alphabetics or '<' or '>'. When the regex offers them to some_only
I look for one of the latter two characters. If neither appears I print the 'word'.
>>> import re
>>> strTest = "<random xml>hello this was successful price<random xml>"
>>> def some_only(matchobj):
... if '<' in matchobj.group() or '>' in matchobj.group():
... pass
... else:
... print (matchobj.group())
... pass
...
>>> ignore = re.sub(r'[<>\w]+', some_only, strTest)
this
was
successful
This works for your test string; however, as others have already mentioned, using a regex on xml will usually lead to many woes.
To use a more conventional approach I had to tidy away a couple of errors in that xml string, namely to change random xml
to random_xml
and to using a proper closing tag.
I prefer to use the lxml library.
>>> strTest = "<random_xml>hello this was successful price</random_xml>"
>>> from lxml import etree
>>> tree = etree.fromstring(strTest)
>>> tree.text
'hello this was successful price'
>>> tree.text.split(' ')[1:-1]
['hello', 'this', 'was', 'successful', 'price']
>>> tree.text.split(' ')[1:-1]
['this', 'was', 'successful']