1

How do I parse a log file (not a full xml file, but it has some portion of xml data) for ExtData tags, which has some name-value pair, I need to mask it like this : For eg:

<ExtData>Name="Jason" Value="Special"</ExtData>
to
<ExtData>Name="Jason" Value="XXXXXXX"</ExtData>

I need to mask ExtData tag value like above only when Name is Jason or some set of name, and not for every Name.

For eg: if "DummyName" is not in set of names, than I do not want to change this below line.

<ExtData>Name="DummyName" Value="Garbage"</ExtData>

For eg: if "DummyName" is not in set of names, than I do not want to change this below line. (Please note that the value is "Jason")

<ExtData>Name="DummyName" Value="Jason"</ExtData>

For eg: if "DummyJasonName" is not in set of names, than I do not want to change this below line. (Note "Jason" in between "Dummy" and "Name")

<ExtData>Name="DummyJasonName" Value="Garbage"</ExtData>

I need to do all this in bash/shell script.

Bottom line is, I want to read a file, say, via sed/awk/match command. Check for ExtData tag in the line. If matched, Read the text between ExtData tag and /ExtData tag. In this multiline text, extract Name. If Name is from a set of names, then mask its corresponding "Value" data with equal number of 'X'.

Please let me know how to achieve the above task.

Update, the input line can actually span over multiple lines.

<ExtData>Name="Jason" 
Value="Special"
    </ExtData>

Or like this too:

<ExtData>
     Name="Jason" 
  Value="Special"
    </ExtData>

Thanks !! Puneet

Puneet Jain
  • 97
  • 1
  • 10

2 Answers2

1

In a bash shell, you can create a copy of the file with the info removed using this

sed 's#\(<ExtData>Name="Jason" Value="\).*\("</ExtData>\)#\1XXXXX\2#' xml.txt > xml_xxx.txt

Note that it's not the "official" way to change a xml file. Lots of format changes could occur that would render this script useless, but if you know that your XML file has 1 info per line formatted like that, it will work, exactly like for a text file and it's quick.

(also the question is tagged sed and bash, if it wasn't that would involve heavy xml parsing using libxml2, saxon or other libraries that can parse XML nodes)

Jean-François Fabre
  • 137,073
  • 23
  • 153
  • 219
  • Thanks @jean-françois-fabre I see that you have hardcoded Jason and XXXXX. This might work for this particular line, but as I mentioned previously, the name is retrived from an array. list_of_names_which_needs_value_changed (Jason Jack Alter Viktoria ....................... 100 names). So I don't want to change for every name, just the names which are present in list_of_names_which_needs_value_changed array... and also, Their value can be of arbitary size, and not fixed to 5 characters. Also, there can be spaces bw ExtData tag and Name, bw Name and Value and finally bw Value and /ExtData. – Puneet Jain Aug 04 '16 at 22:46
1

To make the substitutions only for names Jason and Jim, try:

sed -E '/Jason|Jim/{:a; /Value=/bb; n; ba; :b; s/(Value="X*)[^X"]/\1X/; tb; }' file.xml

This command was tested on GNU sed. For BSD/OSX sed, some minor changes would be needed.

Example

Let's consider this test file:

$ cat file.xml
<ExtData>Name="Jason" Value="Special"</ExtData>
<ExtData>Name="DummyName" Value="Garbage"</ExtData>
<ExtData>Name="Jim"
    Value="OK"
        </ExtData>

Now, let's run our command:

$ sed -E '/Jason|Jim/{:a; /Value=/bb; n; ba; :b; s/(Value="X*)[^X"]/\1X/; tb; }' file.xml
<ExtData>Name="Jason" Value="XXXXXXX"</ExtData>
<ExtData>Name="DummyName" Value="Garbage"</ExtData>
<ExtData>Name="Jim"
    Value="XX"
        </ExtData>

How it works

  • -E

    This tells set to use extended regular expressions.

  • /Jason|Jim/{...}

    This tells sed to run the commands inside the curly braces only for lines that contain Jason or Jim. The command insides the braces breaks down into two parts:

    1. :a; /Value=/bb; n; ba;

      The first part reads lines until we find one that contains Value=. In more detail, :a defines a label a. /Value=/bb branches to label b if the current line contains Value=. If it doesn't, we print out the current line and read in the next one using the n command. We then branch (b) back to label a.

    2. :b; s/(Value="X*)[^X"]/\1X/; tb;

      This replaces the value with as many X as we need.

      In more detail, :b defines a label b. s/(Value="X*)[^X"]/\1X/ substitutes in the next X that we need after Value=. If a substitution was made (meaning that another X was needed), then the test command (t) tells sed to jump back to label b and we try again.

Restricting changes to within ExtData tags

Let's consider this more complex test file:

$ cat file2.xml
<Misc>Name="Jason" Value="DontChange"</Misc>
<ExtData>Name="Jason" Value="Special"</ExtData>
<Misc>Name="Jason" Value="DontChange"</Misc>
<ExtData>Name="DummyName" Value="DontChange"</ExtData>
<Misc>Name="Jason" Value="DontChange"</Misc>
<ExtData>Name="Jim"
    Value="OK"
        </ExtData>
<Misc>Name="Jason" Value="DontChange"</Misc>

To make the changes in ExtData tags but not the other tags, try:

$ sed -E '/[<]ExtData[>]/{:a; /Name=/{/Name="(Jason|Jim)"/!b}; /Value=/bb; n; ba; :b; s/(Value="X*)[^X"]/\1X/; tb; }' file2.xml
<Misc>Name="Jason" Value="DontChange"</Misc>
<ExtData>Name="Jason" Value="XXXXXXX"</ExtData>
<Misc>Name="Jason" Value="DontChange"</Misc>
<ExtData>Name="DummyName" Value="DontChange"</ExtData>
<Misc>Name="Jason" Value="DontChange"</Misc>
<ExtData>Name="Jim"
    Value="XX"
        </ExtData>
<Misc>Name="Jason" Value="DontChange"</Misc>

To do the above using a shell variable for the names:

names='Jason|Jim'
sed -E '/[<]ExtData[>]/{:a; /Name=/{/Name="'"$names"'"/!b}; /Value=/bb; n; ba; :b; s/(Value="X*)[^X"]/\1X/; tb; }' file2.xml

This substitutes the shell variable directly into the sed command. This should only be done this way if you trust the source of the shell variable.

John1024
  • 109,961
  • 14
  • 137
  • 171
  • Thanks. . I knew you would give me a perfact answer... I am very new to scripting, and your responses have surely helped me. Just to clarify, I have preset list of Names like this : declare -a NameList=(Jason Jim) and so forth... So based on ur solution, I should loop through each name in the NameList, and use $name in the sed command like this : sed -i -E '/"$name"/{:a; /Value=/bb; n; ba; :b; s/(Value="X*)[^X"]/\1X/; tb; }' file.xml including "-i" because I need to change the file content in place.... Thanks once again.... – Puneet Jain Aug 05 '16 at 07:43
  • One more thing I noticed that This name value pair should be inside ExtData and /ExtData tags.... Because Jason name and/or Jim name can also be found as part of normal text in the log file. – Puneet Jain Aug 05 '16 at 07:47
  • John, pls let me know. Thanks !! – Puneet Jain Aug 10 '16 at 22:31
  • @PuneetJain Does the complete string `Name="Jason"` appear outside of the `ExtData` tags? Or, is it just `Jason` which might appear outside those tags? – John1024 Aug 10 '16 at 22:34
  • @PuneetJain OK. I just came up with an idea and added it to the answer. Let me know if it works for you. – John1024 Aug 10 '16 at 22:50
  • Thank you @John1024. I will try and let you know the result. I hope your command works on this too... < Name="Jason" Value="Change" < Name="Jim" < Value="Change" < – Puneet Jain Aug 11 '16 at 22:05
  • Thanks @John1024 your solution worked like a charm.... Now w.r.t to xml tag, only 1 simple thing (for you) remains... thats it.. and I will be done... http://stackoverflow.com/questions/38911200/change-string-in-file-between-two-strings-with-character-x see if u have a look at it. No body gave me a perfact solution yet on it yet? Thanks again John !! – Puneet Jain Aug 12 '16 at 15:51
  • Thanks @John1024 your solution worked like a charm.... Now w.r.t to xml tag, only 1 simple thing (for you) remains... thats it.. and I will be done... http://stackoverflow.com/questions/38911200/change-string-in-file-between-two-strings-with-character-x see if u have a look at it. No body gave me a perfact solution yet on it yet? Thanks again John !! – Puneet Jain Aug 12 '16 at 15:52
  • An easy one for you John : http://stackoverflow.com/questions/38911200/change-string-in-file-between-two-strings-with-character-x @John1024 – Puneet Jain Aug 15 '16 at 22:38
  • Your solution worked like a charm, until i saw a line which was masked too which it was not supposed to mask. For example, Name="DummyName" Value="Garbage" line should not masked and it is not masked. Which is correct. But suppose, if the line has "Jason" any where in the line, than also it is being masked. Name="DummyName" Value="Jason" is being masked, Name="DummyJasonName" Value="Garbage" is also being masked. – Puneet Jain Aug 31 '16 at 18:24
  • you there John? @John2014 – Puneet Jain Sep 01 '16 at 13:55