0

I have an xml file with the format

<movie>
<title>Title</title>
<originaltitle>Original Title</originaltitle>
<id>ID1234</id>
</movie>

I am failing to use sed to merge the original title and the id tags, as below:

<movie>
<title>Title</title>
<originaltitle>ID1234 - Original Title</originaltitle>
</movie>

How can I save the match on the id, and reuse it elsewhere when modifying the title tag ? Note that the id tag is optional and therefore not always present, in which case, the original title should remain the same. I can write a script to loop over the file tags and achieve the same, but I thought someone might come up with an elegant sed solution for this. Any idea ? I can match each entry individually, but I don t know how to preserve one to use it later. So far I've got this, which does not work

sed '/<id>(.*)<\/id>/ {s/<sorttitle>(.*)<\/sorttitle>/<sorttitle>\1 - \2<\/sorttitle>/}' movie.nfo
Sandrew
  • 3
  • 2
  • Requisite admonition: http://stackoverflow.com/q/1732348/1072112 – ghoti Nov 29 '16 at 00:27
  • As others have pointed out, using line-oriented tools to process XML is not a good idea. Furthermore, combining title and id seems like a spectacularly bad idea. – Michael Vehrs Nov 29 '16 at 07:17

3 Answers3

0

Don't use sed to process XML files, use an XML-aware tool.

I currently maintain xsh which makes your task really simple:

open file.xml ;
insert text " - " prepend /movie/originaltitle ;
move /movie/id/text() prepend /movie/originaltitle ;
delete /movie/id ;
save :b ;
choroba
  • 231,213
  • 25
  • 204
  • 289
0

If you prefer (gnu)sed, then the following command solves this:

sed -e 'N;' \
    -e '/<\/id>$/ s/<originaltitle>\(.*\)<\/originaltitle>\n<id>\(.*\)<\/id>/<originaltitle>\2 - \1<\originaltitle>/;' movie.nfo

The first command lets you read in always 2 lines.

The second command always gets triggered when the end of the current pattern space contains </id>. Now you just need to rearrange your tags and flip the id and originaltitle values (via the s command).

FloHe
  • 313
  • 1
  • 3
  • 10
0

In awk. Once <originaltitle> and <id> has been read, combine them and print. Tag and closing tag are expected to be in the same record.

$ awk '/<originaltitle>/ { i++; ot=$0; next }
                  /<id>/ { i++; gsub(/<\/?id>/,""); id=$0; next } 
                    i==2 { i=""; sub(/<originaltitle>/,"&" id " - ",ot); print ot } 
       1' file
<movie>
<title>Title</title>
<originaltitle>ID1234 - Original Title</originaltitle>
</movie>
James Brown
  • 36,089
  • 7
  • 43
  • 59