0

I have a file with below data.Lets call it as myfile.xml:

.........
<header>unique_name</header>
......
somelines
......
<version>I need only this line</version>
......
......
<version>This is second match of version, which I dont want</version>

Now I'm in search of linux commands that does below things:

  1. There can be many <header>.*</header> lines. But I need <header>unique_name</header> .This is an unique header name that I will hardcore it.It appears only once in the file, but can appear anywhere in the file.

  2. Search for <version>.*</version> that appears after <header>unique_name</header> in myfile.txt and this should be replaced with <version>new version number</version>.

I've tried implementing using grep, sed, awk, but I could not. Please advise.

Input and Expected Output:

Input file "myfile.xml":

  • stringtoFIND=<header>unique_name</header>
  • newversionNUMBER=new_version_number

The myfile.xml file contents below:

<header>Some strings</header>
......Somelines...........
<version>I dont need this line, since header doesnt match stringtoFIND variable</version>

<header>unique_name</header>
.............
<version>I need only this line</version>
...........
..........
<version>I Dont need this line</version>
.........

Expected output

<header>Some strings</header>
......Somelines...........
<version>I dont need this line, since header doesnt match stringtoFIND variable</version>

<header>unique_name</header>
.............
<version>new_version_number</version>
...........
..........
<version>I Dont need this line</version>
.........
kenorb
  • 155,785
  • 88
  • 678
  • 743
toLearn
  • 21
  • 4
  • 1
    can you be more specific about the expected output? – artm Jul 23 '16 at 14:00
  • 1
    Can you provide both your earlier attempts, what they provide, and a more concrete example? You mention both `<\header>`, which wouldn't be xml compliant. If your data is xml, and you are looking for a particular key in it, have you considered using an xml parser? It's not really a good idea to parse html with regular expressions, as explained here: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – Landak Jul 23 '16 at 14:00
  • Updated the question as per experts comments – toLearn Jul 23 '16 at 14:38
  • The `myfile.xml` and the ***Expected output*** appear to be the same. – agc Jul 24 '16 at 22:44
  • @agc , "I need only this line" in myfile.xml has been replaced by "new_version_number" in Expected output. – toLearn Jul 25 '16 at 06:20
  • Use an XML-aware tool like `xmlstarlet`. – Michael Vehrs Jul 25 '16 at 06:45

3 Answers3

1

Using GNU awk for the 3rd arg to match():

$ cat tst.awk
match($0,/<header>(.*)<\/header>/,a) {
    inBlock = (a[1] == "unique_name" ? 1 : 0)
}

inBlock && match($0,/(.*<version>).*(<\/version>.*)/,a) {
    $0 = a[1] "new_version_number" a[2]
    inBlock = 0
}

{ print }

$ awk -f tst.awk file
<header>Some strings</header>
......Somelines...........
<version>I dont need this line, since header doesnt match stringtoFIND variable</version>

<header>unique_name</header>
.............
<version>new_version_number</version>
...........
..........
<version>I Dont need this line</version>
.........
Ed Morton
  • 188,023
  • 17
  • 78
  • 185
  • Hi , Thanks alot. When I run ,I'm getting following error: awk: tst.awk : line1 : syntax error at or near , awk: tst.awk : line5 : syntax error at or near , – toLearn Jul 24 '16 at 09:55
  • My crystal ball tells me you aren't using GNU awk as the answer says you need. Get GNU awk. If that's not it then my next guess is you copy/pasted the script incorrectly. If it's none of those then edit your question to show the script you can and the awk version (`awk --version`) you are using. – Ed Morton Jul 24 '16 at 21:34
0

You can do this with awk like this.

script.awk

/<header>unique_name<\/header>/ { found=1; done=0 }
/<version>.*<\/version>/ && found && !done {
      # replace version in $0
      gsub(/<version>.*<\/version>/,"<version>new_version_number</version>")
      done = 1
    }

# implicitly print current $0:
1

Run the script: awk -f script.awk yourfile > newfile

Each line is printed and replacement of version is done according to the state in found and done.

Lars Fischer
  • 9,135
  • 3
  • 26
  • 35
  • If the target header block doesn't contain a version line, this will change the first version line in a different block after it. Also, you're only doing 1 substitution so you should be using sub() not gsub(). – Ed Morton Jul 23 '16 at 21:36
0

A similar answer to the one by Lars Fischer:

#! /usr/bin/awk -f

/<header>.*<\/header>/ {
    looking = 0
}

 /<header>unique_name<\/header>/ {
    looking = 1
}

looking && /<version>.*<\/version>/ {
    n = match($0, /^ *<version>/)
    $0 = substr($0, 1, n) Version "</version>"
    looking = 0    
}

{ print }

I construct the new version line instead of substituting it. In rules, I put the boolean before the regex because it's more efficient, not that you'll notice. I personally dislike ending the script with 1 to indicate printing, but that's just a style choice.

Invoke as

$ awk -v Version="$version" -f script.awk input
Community
  • 1
  • 1
James K. Lowden
  • 7,574
  • 1
  • 16
  • 31