0

I want to replace specific xml node value using sed or awk. I can't use specialized packages for parsing xml like xmlstarlet, xmllint etc. I have to use sed or awk, just "basic" shell.

I have many big xml files. In that file I want to target and replace two tags values: example:

<desc:partNumber>>2</desc:partNumber>
<desc:dateIssued>>1870</desc:dateIssued>

Problem is, there are hundreds tags with these names. But these two tags have parent tag that is unique within whole xml file:

<desc:desc ID="DESC_VOLUME_0001">

Another problem is that location or line numbers of tags <desc:partNumber> and <desc:dateIssued> which are inside parent <desc:desc ID="DESC_VOLUME_0001"> are different in every file.

I think the solution would be:

  1. Target and extract parent <desc:desc ID="DESC_VOLUME_0001"> and its children to variable
  2. Iterate children and get location(line number) of <desc:partNumber> and <desc:dateIssued> and save to variable
  3. Pass the line number to sed command and replace current value of that tag with new value(new value will be read from .csv file)

I tried create this sed command, you can see I used 'n' to move over lines, but this needs to be variable.

sed -i '/desc:desc ID="DESC_VOLUME_0001"/{n;n;n;n;n;n;n;n;n;s/'"${OLD_DATE_ISSUED}"'/'"${NEW_DATE_ISSUED}"'/}'

Parent node with children:

<desc:desc ID="DESC_VOLUME_0001"> 
    <desc:physicalDescription> 
        <desc:note>text</desc:note> 
    </desc:physicalDescription>  
    <desc:titleInfo> 
        <desc:partNumber>2</desc:partNumber> 
    </desc:titleInfo>  
    <desc:originInfo> 
        <desc:dateIssued>1870</desc:dateIssued> 
    </desc:originInfo>  
    <desc:identifier type="uuid">81e32d30-6388-11e6-8336-005056827e52</desc:identifier> 
</desc:desc> 

Can anybody help how to achieve this?

agc
  • 7,973
  • 2
  • 29
  • 50
Michael
  • 169
  • 2
  • 2
  • 16
  • Please add sample input (no descriptions, no images, no links) and your desired output for that sample input to your question (no comment). – Cyrus Oct 16 '20 at 12:08
  • 4
    *can't use specialized packages for parsing xml like xmlstarlet, xmllint etc.* You should tell whoever's making that decision that they're kneecapping you. Using an XML aware tool is the only way to do this robustly and effectively. – Shawn Oct 16 '20 at 13:03
  • Why not use separate sed substitution for opening and closing tag? And do take care of the angle brackets lest you will accidentally replace desc:partNumberClient also. – Gyro Gearloose Oct 16 '20 at 13:45
  • 3
    "I have to use sed or awk, just "basic" shell." No you don't. It's the wrong tool for the job. Your code will be incorrect and inefficient. – Michael Kay Oct 16 '20 at 16:48
  • Related: *bobince*'s cautionary answer to [RegEx match open tags except XHTML self-contained tags](https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454) – agc Oct 16 '20 at 18:28

1 Answers1

2

With the example data in the file xmldata:

awk -v dID="DESC_VOLUME_0001" -v part="5" -v dissue=1850 -F[\<\>] 
  '$2 ~ /desc ID/ { 
                     split($2,arr,"\"");
                     descID=arr[2] 
                  } 
   $2 ~ /desc:partNumber/ { 
                            if (descID==dID) { 
                                               $0=gensub($3,part,$0) 
                                             } 
                          } 
   $2 ~ /desc:dateIssued/ { 
                            if (descID==dID) 
                                             { 
                                               $0=gensub($3,dissue,$0) 
                                             } 
                          }
   1' xmldata

One liner:

 awk -v dID="DESC_VOLUME_0001" -v part="5" -v dissue=1850 -F[\<\>] '$2 ~ /desc ID/ { split($2,arr,"\"");descID=arr[2] } $2 ~ /desc:partNumber/ { if (descID==dID) { $0=gensub($3,part,$0) } } $2 ~ /desc:dateIssued/ { if (descID==dID) { $0=gensub($3,dissue,$0) } }1' xmldata

Here we set the delimiters to < or > We also set dID to the desc ID we want to search for, part the partNumber we want to change to and dissue to the dateIssued we want to change.

We then search for the desc ID text in the line and split the line based on double quotes to get the second index of the array arr which is then used to create the variable descID.

We further search for partNumber and dateIssued, checking to see if dID=descID. If they match we replace the 3rd delimited field in the line $0 with the passed variables using the gensub function and set $0 to the result. We finally print the line (changed or otherwise) through 1.

Raman Sailopal
  • 12,320
  • 2
  • 11
  • 18