Don't parse XML/HTML with regex, use a proper XML/HTML parser and a powerful xpath query.
theory :
According to the compiling theory, XML/HTML can't be parsed using regex based on finite state machine. Due to hierarchical construction of XML/HTML you need to use a pushdown automaton and manipulate LALR grammar using tool like YACC.
realLife©®™ everyday tool in a shell :
You can use one of the following :
xmllint often installed by default with libxml2
, xpath1 (check my wrapper to have newlines delimited output
xmlstarlet can edit, select, transform... Not installed by default, xpath1
xpath installed via perl's module XML::XPath, xpath1
xidel xpath3
saxon-lint my own project, wrapper over @Michael Kay's Saxon-HE Java library, xpath3
or you can use high level languages and proper libs, I think of :
python's lxml
(from lxml import etree
)
perl's XML::LibXML
, XML::XPath
, XML::Twig::XPath
, HTML::TreeBuilder::XPath
ruby nokogiri, check this example
php DOMXpath
, check this example
Check: Using regular expressions with HTML tags
xmlstarlet ed -u '//Init/@max_value' -v '100' *.xml
If you want to edit in place, use -L
switch :
xmlstarlet ed -L -u '//Init/@max_value' -v '100' *.xml
Example using xpath & python to edit in place
# edit in place XML
from lxml import etree
import sys
myXML = sys.argv[1]
tree = etree.parse(myXML)
root = tree.getroot()
code = root.xpath("//Init")
for i in code:
if (i.attrib['max_value']):
i.attrib['max_value'] = '100'
etree.ElementTree(root).write(myXML, pretty_print=True)