0

I have an xml file like this:

<string name="abc_searchview_description_clear">Clear query</string>
<string name="abc_searchview_description_query">Search query</string>
<string name="abc_searchview_description_search">Search</string>
<string name="abc_searchview_description_submit">Submit query</string>
<string name="abc_searchview_description_voice">Voice search</string>
<string name="abc_shareactionprovider_share_with">Share with</string>

and I want to find all the string values i.e., [Clear query, Search query, Search, Submit query, Voice search, Share with] in python3.

I'm just wondering if it is possible to draw regular expression for it? {similar to what we do to find all the text inside a double qoute [Find_double_quotes = re.compile('"([^"]*)"',re.DOTALL | re.MULTILINE | re.IGNORECASE)}

2 Answers2

1

If your use case is as simple as the one shown here (i.e. no nesting, no other tags), this should do it re.findall("<string[^>]*>(.*)</string>", your_text)

Although, XML is not a regular language, so the regex will fail for some more general cases. If you need it for more complex cases, you may consider not using regex for this. See this question with a more detailed explanation.

pdpino
  • 444
  • 4
  • 13
  • And do remember that every time a recipient of XML writes code that can only handle a restricted subset of XML (e.g. XML with no comments in), some supplier of that XML is soon going to raise a question on StackOverflow asking how they can generate XML that conforms to that restricted subset. If this code is for live production use, then use a proper XML parser. – Michael Kay Apr 13 '19 at 08:08
0

Many Thanks, it works fine for me.

I also try this: Find_double_quotes = re.compile('">([^>]*)

It also produce the same result