3

I'm using this answer here: https://stackoverflow.com/a/18622953/1797263 to replace a version in a pom.xml file. The problem I'm running into is that it is stripping the preceding whitespace and I want to keep the preceding whitespace. The whitespace could be 2 or 3 tabs or spaces, depending on how the developer formatted the file.

Here is an example:

        <dependency>
            <groupId>GROUP</groupId>
            <artifactId>ARTIFACT</artifactId>
            <version>OLD_VERSION</version>
        </dependency>

My command: sed -i '/<artifactId>ARTIFACT<\/artifactId>/!b;n;c<version>NEW_VERSION</version>' pom.xml

And my output:

        <dependency>
            <groupId>GROUP</groupId>
            <artifactId>ARTIFACT</artifactId>
<version>NEW_VERSION</version>
        </dependency>

Here is what I would like the replacement to look like:

        <dependency>
            <groupId>GROUP</groupId>
            <artifactId>ARTIFACT</artifactId>
            <version>NEW_VERSION</version>
        </dependency>

I read through the GNU Sed manual and could not find anything that would help.

Chris Savory
  • 2,597
  • 1
  • 17
  • 27
  • 3
    [Don't Parse XML/HTML With Regex.](https://stackoverflow.com/a/1732454/3776858) I suggest to use an XML/HTML parser (xmlstarlet, xmllint ...). – Cyrus Dec 24 '19 at 15:41

2 Answers2

2

Using a proper parser :

xmlstarlet edit -L -u '/dependency/version' -v NEW_VERSION file.xml

 Output

<?xml version="1.0"?>
<dependency>
  <groupId>GROUP</groupId>
  <artifactId>ARTIFACT</artifactId>
  <version>NEW_VERSION</version>
</dependency>

Don't parse XML/HTML with regex, use a proper XML/HTML parser and a powerful query.

theory :

According to the compiling theory, XML/HTML can't be parsed using regex based on finite state machine. Due to hierarchical construction of XML/HTML you need to use a pushdown automaton and manipulate LALR grammar using tool like YACC.

realLife©®™ everyday tool in a :

You can use one of the following :

xmllint often installed by default with libxml2, xpath1 (check my wrapper to have newlines delimited output

xmlstarlet can edit, select, transform... Not installed by default, xpath1

xpath installed via perl's module XML::XPath, xpath1

xidel xpath3

saxon-lint my own project, wrapper over @Michael Kay's Saxon-HE Java library, xpath3

or you can use high level languages and proper libs, I think of :

's lxml (from lxml import etree)

's XML::LibXML, XML::XPath, XML::Twig::XPath, HTML::TreeBuilder::XPath

, check this example

DOMXpath, check this example


Check: Using regular expressions with HTML tags

enter image description here

Gilles Quénot
  • 173,512
  • 41
  • 224
  • 223
1

This might work for you (GNU sed):

sed -i '/<artifactId>ARTIFACT<\/artifactId>/{n;s/\S.*/<version>NEW_VERSION<\/version>/}' file

Overwrite the old version with the new version using the first non-whitespace character as a starting place for the replacing string.

potong
  • 55,640
  • 6
  • 51
  • 83
  • Me voted down (-1) because giving bad practices to OP. Edited my answer with explanations 'why not parsing xml with regex' – Gilles Quénot Dec 25 '19 at 13:43
  • @GillesQuenot no problem. I voted you up (+1) for using the best tool. – potong Dec 25 '19 at 23:11
  • I haven't asked a ton of questions on SO, so I'm not exactly sure what the right etiquette here is. This answer does exactly what I was asking for, using the tool i was trying to use. But there is another answer which is supposedly a 'best practice' on Linux. Which answer should I mark as correct? – Chris Savory Jan 23 '20 at 16:40
  • I think this answer is correct. It really answers the question which was about the use of sed. There may be many reason why someone don't (can't) use any other tool. – Squake Jul 27 '23 at 13:01