0

it's been a while since I have used AWK, but now i have an XML file where I like to increase an Id for a specific column; so it's an excellent task for AWK. In theory, to increase the Id, the line gets parsed, you fish out the Id into a variable, ++ it, and reconstruct the line to print to the resulting stream. However, when I use the variable ( x = sprintf(...)) I got weird results, so I used printf to debug the thing. And now the weird part : printf dumps exactly the right Id, but the variable gets rubbish, although they have the same input & syntax ... Must be somethinig really silly off course, but I can't put my finger on it.

All Ids are in the form :

<column name="Id" type="System.Int32">x</column>

This is the code :

#!/bin/ksh
cd /mnt/c/J/D/Work/GIT/Repos/SkillsNG/SkillsNG.WebTests/Snapshots
print -n "Snapshot name: "
read snapshot
defaultId=0
print -n "Start increasing from Id [$defaultId]":
read id

[[ "$id" = '' ]] && id=$defaultId

cat $snapshot | awk -F '>' 'BEGIN {process=0;} {
if (match($0, /SecurityPermissions/))
     {process=1;}
else
  {
    if (!process)
    {
       # just dump all tables up to SecurityPermissions, no processing needed
       print;
       next;
    }
  }
if (match($1, /<column name="Id" type="System.Int32"/))
  {
   if(match($2,/[0-9]*/))
   {
      printf "param 1: %s\n", $1
      printf "param 2: %s\n", $2
      printf "Id value : %s\n", substr($2, 1, index($2,"\<")-1);

      val = sprintf("%s", substr($2,1, index($2,"\<")-1));
      printf "value in variable: %s\n", $val;
      newval = strtonum(val);
      printf "new: %s\n",  $newval

   }
 }
 else print $0;
}' > $snapshot.new
# mv $snapshot $snapshot.old
# mv $snapshot.new $snapshot
cd -

And this a simple test xml:

    <snapshot culture="en-US">
  <table name="[dbo].[SecurityPermissions]">
    <row>
      <column name="Id" type="System.Int32">1</column>
      <column name="GroupId" type="System.Int32">1</column>
      <column name="Securable" type="System.Int32">1</column>
      <column name="Access" type="System.Int32">1</column>
    </row>
    <row>
      <column name="Id" type="System.Int32">2</column>
      <column name="GroupId" type="System.Int32">1</column>
      <column name="Securable" type="System.Int32">50</column>
      <column name="Access" type="System.Int32">1</column>
    </row>
    <row>
      <column name="Id" type="System.Int32">3</column>
      <column name="GroupId" type="System.Int32">1</column>
      <column name="Securable" type="System.Int32">51</column>
      <column name="Access" type="System.Int32">15</column>
    </row>
    <row>
      <column name="Id" type="System.Int32">4</column>
      <column name="GroupId" type="System.Int32">1</column>
      <column name="Securable" type="System.Int32">52</column>
      <column name="Access" type="System.Int32">3</column>
    </row>
    <row>
      <column name="Id" type="System.Int32">5</column>
      <column name="GroupId" type="System.Int32">1</column>
      <column name="Securable" type="System.Int32">53</column>
      <column name="Access" type="System.Int32">1</column>
    </row>
    <row>
      <column name="Id" type="System.Int32">6</column>
      <column name="GroupId" type="System.Int32">1</column>
      <column name="Securable" type="System.Int32">54</column>
      <column name="Access" type="System.Int32">3</column>
    </row>
    <row>
      <column name="Id" type="System.Int32">7</column>
      <column name="GroupId" type="System.Int32">1</column>
      <column name="Securable" type="System.Int32">56</column>
      <column name="Access" type="System.Int32">1</column>
    </row>
    <row>
      <column name="Id" type="System.Int32">8</column>
      <column name="GroupId" type="System.Int32">1</column>
      <column name="Securable" type="System.Int32">57</column>
      <column name="Access" type="System.Int32">3</column>
    </row>
    <row>
      <column name="Id" type="System.Int32">9</column>
      <column name="GroupId" type="System.Int32">1</column>
      <column name="Securable" type="System.Int32">77</column>
      <column name="Access" type="System.Int32">15</column>
    </row>
    <row>
      <column name="Id" type="System.Int32">10</column>
      <column name="GroupId" type="System.Int32">1</column>
      <column name="Securable" type="System.Int32">78</column>
      <column name="Access" type="System.Int32">15</column>
    </row>
    <row>
      <column name="Id" type="System.Int32">11</column>
      <column name="GroupId" type="System.Int32">1</column>
      <column name="Securable" type="System.Int32">80</column>
      <column name="Access" type="System.Int32">15</column>
    </row>
  </table>
</snapshot>

Result file :

<snapshot culture="en-US">
  <table name="[dbo].[SecurityPermissions]">
    <row>
param 1:       <column name="Id" type="System.Int32"
param 2: 1</column
Id value : 1
value in variable:       <column name="Id" type="System.Int32"
new:       <column name="Id" type="System.Int32"
      <column name="GroupId" type="System.Int32">1</column>
      <column name="Securable" type="System.Int32">1</column>
      <column name="Access" type="System.Int32">1</column>
    </row>
    <row>
param 1:       <column name="Id" type="System.Int32"
param 2: 2</column
Id value : 2
value in variable: 2</column
new: 2</column
      <column name="GroupId" type="System.Int32">1</column>
      <column name="Securable" type="System.Int32">50</column>
      <column name="Access" type="System.Int32">1</column>
    </row>
    <row>
param 1:       <column name="Id" type="System.Int32"
param 2: 3</column
Id value : 3
value in variable: 
new: 
      <column name="GroupId" type="System.Int32">1</column>
      <column name="Securable" type="System.Int32">51</column>
      <column name="Access" type="System.Int32">15</column>
    </row>
    <row>
param 1:       <column name="Id" type="System.Int32"
param 2: 4</column
Id value : 4
value in variable: 
new: 
      <column name="GroupId" type="System.Int32">1</column>
      <column name="Securable" type="System.Int32">52</column>
      <column name="Access" type="System.Int32">3</column>
    </row>
    <row>
param 1:       <column name="Id" type="System.Int32"
param 2: 5</column
Id value : 5
value in variable: 
new: 
      <column name="GroupId" type="System.Int32">1</column>
      <column name="Securable" type="System.Int32">53</column>
      <column name="Access" type="System.Int32">1</column>
    </row>
    <row>
param 1:       <column name="Id" type="System.Int32"
param 2: 6</column
Id value : 6
value in variable: 
new: 
      <column name="GroupId" type="System.Int32">1</column>
      <column name="Securable" type="System.Int32">54</column>
      <column name="Access" type="System.Int32">3</column>
    </row>
    <row>
param 1:       <column name="Id" type="System.Int32"
param 2: 7</column
Id value : 7
value in variable: 
new: 
      <column name="GroupId" type="System.Int32">1</column>
      <column name="Securable" type="System.Int32">56</column>
      <column name="Access" type="System.Int32">1</column>
    </row>
    <row>
param 1:       <column name="Id" type="System.Int32"
param 2: 8</column
Id value : 8
value in variable: 
new: 
      <column name="GroupId" type="System.Int32">1</column>
      <column name="Securable" type="System.Int32">57</column>
      <column name="Access" type="System.Int32">3</column>
    </row>
    <row>
param 1:       <column name="Id" type="System.Int32"
param 2: 9</column
Id value : 9
value in variable: 
new: 
      <column name="GroupId" type="System.Int32">1</column>
      <column name="Securable" type="System.Int32">77</column>
      <column name="Access" type="System.Int32">15</column>
    </row>
    <row>
param 1:       <column name="Id" type="System.Int32"
param 2: 10</column
Id value : 10
value in variable: 
new: 
      <column name="GroupId" type="System.Int32">1</column>
      <column name="Securable" type="System.Int32">78</column>
      <column name="Access" type="System.Int32">15</column>
    </row>
    <row>
param 1:       <column name="Id" type="System.Int32"
param 2: 11</column
Id value : 11
value in variable: 
new: 
      <column name="GroupId" type="System.Int32">1</column>
      <column name="Securable" type="System.Int32">80</column>
      <column name="Access" type="System.Int32">15</column>
    </row>
  </table>
</snapshot>

As can be seen from the result file, the Id value (via printf) works correctly, but the same construct to a variable (via sprintf) produces garbage. Anybody got an idea what is going on? Thanks in advance. Cheers, DJ

  • Show your `awk` code that uses `sprintf` since that's the code that is misbehaving. – lurker Feb 13 '19 at 18:01
  • val = sprintf("%s", substr($2,1, index($2,"\<")-1)); it's right under the 3 printf statements param1, param 2 & IdValue – D.J. Hartman Feb 13 '19 at 18:04
  • Ah ok, didn't spot it. Thanks. – lurker Feb 13 '19 at 18:09
  • `$var` doesn't do what you think. Nor `$newval`. – jhnc Feb 13 '19 at 18:35
  • 1
    First of all Welcome To StackOverflow. You seem to be processing XML or HTML with awk. While we admire your courage, it should be mentioned that this is generally considered to be a bad idea. Sooner or later, you will open Pandora's box and all the ugliness of XML will come out. [It could be a Pony](https://stackoverflow.com/questions/1732348), but then an ugly raging one that will destroy your entire belief system. In order to save your future soul, we recommend considering using an XML parser or XSLT. – kvantour Feb 13 '19 at 18:36

1 Answers1

1

First of all, you should not try to do these things with awk or sed or anything like that. XML is a complicated data-structure with all its ugliness connected to it. While a simple awk might do it now, it will fail all of a sudden and you will not know what hit you.

If you want to increment that particular value, you could use the following xmlstarlet command:

 $ xmlstarlet ed --update '//table/row/column[@name="Id"]' -x ".+1" test.xml

It reads like this: xmlstarlet will (ed) edit the file test.xml by (--update) updating all nodes that match the XPath expression ('//table/row/column[@name="Id"]' :: all nodes column child of row child of of table with an attribute name equal to Id) and change its value with an XPath expression -x ".+1" (increment current value (.))

To answer your question: you obtain the unexpected results with awk because you reference some variables using $. Example:

val = sprintf("%s", substr($2,1, index($2,"\<")-1));
printf "value in variable: %s\n", $val;

In your first line, you compute the value of val but in the second line, you use $val. The latter actually returns the value of the field with number val. So if val=2, then $val will return $2, i.e. the content of the second field.

kvantour
  • 25,269
  • 4
  • 47
  • 72
  • Unfortunately the distro currently has no xmlstarlet, and the xml is fixed, so that's why I went for AWK (okay, and a bit of nostalgia ;-). You are absolutely right about the `$val` vs `val` (kind of like reference vs value in C). I knew it had to be something small but significant. Thanks again for your help! – D.J. Hartman Feb 14 '19 at 09:54
  • @D.J.Hartman xmlstarlet might also just be installed under the name xml. Nonetheless, I'm happy to see your problem is resolved. – kvantour Feb 14 '19 at 10:01