Search for a line in file and replace a next pattern matched line with newline in linux(Shell scripting)

Question

I have a file with below data.Lets call it as myfile.xml:

.........
<header>unique_name</header>
......
somelines
......
<version>I need only this line</version>
......
......
<version>This is second match of version, which I dont want</version>

Now I'm in search of linux commands that does below things:

There can be many <header>.*</header> lines. But I need <header>unique_name</header> .This is an unique header name that I will hardcore it.It appears only once in the file, but can appear anywhere in the file.
Search for <version>.*</version> that appears after <header>unique_name</header> in myfile.txt and this should be replaced with <version>new version number</version>.

I've tried implementing using grep, sed, awk, but I could not. Please advise.

Input and Expected Output:

Input file "myfile.xml":

stringtoFIND=<header>unique_name</header>
newversionNUMBER=new_version_number

The myfile.xml file contents below:

<header>Some strings</header>
......Somelines...........
<version>I dont need this line, since header doesnt match stringtoFIND variable</version>

<header>unique_name</header>
.............
<version>I need only this line</version>
...........
..........
<version>I Dont need this line</version>
.........

Expected output

<header>Some strings</header>
......Somelines...........
<version>I dont need this line, since header doesnt match stringtoFIND variable</version>

<header>unique_name</header>
.............
<version>new_version_number</version>
...........
..........
<version>I Dont need this line</version>
.........

Can you provide both your earlier attempts, what they provide, and a more concrete example? You mention both `<\header>`, which wouldn't be xml compliant. If your data is xml, and you are looking for a particular key in it, have you considered using an xml parser? It's not really a good idea to parse html with regular expressions, as explained here: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 — Landak, Jul 23 '16 at 14:00
The `myfile.xml` and the ***Expected output*** appear to be the same. — agc, Jul 24 '16 at 22:44
@agc , "I need only this line" in myfile.xml has been replaced by "new_version_number" in Expected output. — toLearn, Jul 25 '16 at 06:20

score 1 · Answer 1 · answered Jul 23 '16 at 21:30

1

Using GNU awk for the 3rd arg to match():

$ cat tst.awk
match($0,/<header>(.*)<\/header>/,a) {
    inBlock = (a[1] == "unique_name" ? 1 : 0)
}

inBlock && match($0,/(.*<version>).*(<\/version>.*)/,a) {
    $0 = a[1] "new_version_number" a[2]
    inBlock = 0
}

{ print }

$ awk -f tst.awk file
<header>Some strings</header>
......Somelines...........
<version>I dont need this line, since header doesnt match stringtoFIND variable</version>

<header>unique_name</header>
.............
<version>new_version_number</version>
...........
..........
<version>I Dont need this line</version>
.........

answered Jul 23 '16 at 21:30

Ed Morton

188,023
17
78
185

Hi , Thanks alot. When I run ,I'm getting following error: awk: tst.awk : line1 : syntax error at or near , awk: tst.awk : line5 : syntax error at or near , – toLearn Jul 24 '16 at 09:55
My crystal ball tells me you aren't using GNU awk as the answer says you need. Get GNU awk. If that's not it then my next guess is you copy/pasted the script incorrectly. If it's none of those then edit your question to show the script you can and the awk version (`awk --version`) you are using. – Ed Morton Jul 24 '16 at 21:34

score 0 · Answer 2 · answered Jul 23 '16 at 14:40

0

You can do this with awk like this.

script.awk

/<header>unique_name<\/header>/ { found=1; done=0 }
/<version>.*<\/version>/ && found && !done {
      # replace version in $0
      gsub(/<version>.*<\/version>/,"<version>new_version_number</version>")
      done = 1
    }

# implicitly print current $0:
1

Run the script: awk -f script.awk yourfile > newfile

Each line is printed and replacement of version is done according to the state in found and done.

answered Jul 23 '16 at 14:40

Lars Fischer

9,135
3
26
35

If the target header block doesn't contain a version line, this will change the first version line in a different block after it. Also, you're only doing 1 substitution so you should be using sub() not gsub(). – Ed Morton Jul 23 '16 at 21:36

score 0 · Answer 3 · edited May 23 '17 at 11:51

A similar answer to the one by Lars Fischer:

#! /usr/bin/awk -f

/<header>.*<\/header>/ {
    looking = 0
}

 /<header>unique_name<\/header>/ {
    looking = 1
}

looking && /<version>.*<\/version>/ {
    n = match($0, /^ *<version>/)
    $0 = substr($0, 1, n) Version "</version>"
    looking = 0    
}

{ print }

I construct the new version line instead of substituting it. In rules, I put the boolean before the regex because it's more efficient, not that you'll notice. I personally dislike ending the script with 1 to indicate printing, but that's just a style choice.

Invoke as

$ awk -v Version="$version" -f script.awk input

@EdMorton, modified to address your concerns. – James K. Lowden Jul 23 '16 at 23:42 — James K. Lowden, Jul 23 '16 at 23:42

Search for a line in file and replace a next pattern matched line with newline in linux(Shell scripting)

3 Answers3