-2

In a Bash script, I am trying to in-file replace the characters between two given strings by 'X'. I have bunch of string pair, between which I want the replacement of characters by 'X' should happen.
In the below code, the first string in the pair is declared in cpi_list array. The second string in the pair is always either %26 or & or ENDOFLINE

This is what I am doing.

# list of "first" or "start" string
declare -a cpi_list=('%26Name%3d' '%26Pwd%3d')  

# This is the "end" string
myAnd=\%26
newfile="inputlog.txt"

for item in "${cpi_list[@]}";
do
    sed -i -e :a -e "s/\($item[X]*\)[^X]\(.*"$myAnd"\)/\1X\2/;ta" $newfile;
done

The input

CPI.%26Name%3dJASON%26Pwd%3dBOTTLE%26Name%3dCOTT
CPI.%26Name%3dVoorhees&machete

I want to make it

CPI.%26Name%3dXXXXX%26Pwd%3dXXXXXX%26Name%3dXXXX
CPI.%26Name%3dXXXXXXXX&machete

PS: The last item need also change %26Name%3dCOTT to %26Name%3dXXXX even though there is no end %26 because I am looking for either %26 as the end point or the END OF THE LINE

But somehow it is not working.

Ed Morton
  • 188,023
  • 17
  • 78
  • 185
Puneet Jain
  • 97
  • 1
  • 10
  • What result *are* you getting? – chepner Jan 30 '17 at 19:57
  • %3dCOTT because it is between %26Name%3d and the end of the line. – Puneet Jain Jan 30 '17 at 20:18
  • @chepner This the result I am getting `CPI.%26Name%3dXXXXXXXXXXXXXXXXXXXX%26Name%3dCOTT` – Puneet Jain Jan 30 '17 at 20:22
  • @anubhava END OF THE LINE is not very important, i can live without it too... good to have it but not so critical.. but the other part is important.. if you can give the solution, i would really appreciate it. – Puneet Jain Jan 30 '17 at 20:24
  • 1
    Possible duplicates of the following posts: https://stackoverflow.com/questions/41885198/replace-all-characters-between-two-strings-in-a-line-by-x https://stackoverflow.com/questions/41864172/bash-script-in-file-replace-characters-with-x-between-two-given-strings-usin https://stackoverflow.com/questions/38911200/change-string-in-file-between-two-strings-with-character-x – alvits Jan 30 '17 at 22:44
  • Yeh @alvits... I was asked to raise it as a separate question.. – Puneet Jain Jan 30 '17 at 22:52
  • 1
    How many times is [the same question](https://stackoverflow.com/questions/41944021/sed-substitude-all-characters-between-two-strings-by-char-x#comment71073884_41944021) going to be asked? –  Jan 30 '17 at 23:37
  • Because @sorontar if you noticed the last post, it was not perfactly answered. So i had to ask again. – Puneet Jain Jan 31 '17 at 00:20
  • The problem you had previously is that you hadn't tagged your question with awk, only sed. sed is for simple substitutions on individual lines. What you're trying to do is not that so you should be looking for an awk solution, not a sed one. – Ed Morton Jan 31 '17 at 01:46

3 Answers3

4

This will work in any awk called from any shell in any UNIX installation:

$ cat tst.awk
BEGIN {
    begs = "%26Name%3d|%26Pwd%3d"
    ends = "%26|&"
}
{
    head = ""
    tail = $0
    while( match(tail, begs) ) {
        tgtStart = RSTART + RLENGTH
        tgt = substr(tail,tgtStart)
        if ( match(tgt, ends) ) {
            tgt = substr(tgt,1,RSTART-1)
        }

        gsub(/./,"X",tgt)
        head = head substr(tail,1,tgtStart-1) tgt
        tail = substr(tail,tgtStart+length(tgt))
    }
    $0 = head tail

    print
}

$ cat file
CPI.%26Name%3dJASON%26Pwd%3dBOTTLE%26Name%3dCOTT
CPI.%26Name%3dVoorhees&machete

$ awk -f tst.awk file
CPI.%26Name%3dXXXXX%26Pwd%3dXXXXXX%26Name%3dXXXX
CPI.%26Name%3dXXXXXXXX&machete

Just like with a sed subsitution, any regexp metacharacter in the beg and end strings would need to be escaped or we'd have to use a loop with index()s instead of match() so we'd do string matching instead of regexp matching.

Ed Morton
  • 188,023
  • 17
  • 78
  • 185
  • I am closing this post and marking your answer as the final answer.. however one thing thats pitching my mind is, can this be done infile? instead of "print" and writing the output back to the original file? just asking. – Puneet Jain Jan 31 '17 at 06:28
  • 1
    With GNU awk you can add the `-i infile` argument but why bother when you can just do `cmd file > tmp && mv tmp file` for any command including `awk 'script' file > tmp && mv tmp file`. There's no "instead of print" though - you still need the print either way. – Ed Morton Jan 31 '17 at 06:44
0

It is not pretty but you can use perl:

$ s1="CPI.%26Name%3dJASON%26Pwd%3dBOTTLE%26Name%3dCOTT"
$ echo "$s1" | perl -lne 'if (/(?:^.*%26Name%3d)(.*)(?:%26Pwd%3d)(?:.*%26Name%3d)(.*)((?:%26Pwd%3d)|(?:$))/) { 
        $i1=$-[1];
        $l1=$+[1]-$-[1];
        $i2=$-[2];
        $l2=$+[2]-$-[2];
        substr($_, $i1, $l1, "X"x$l1);
        substr($_, $i2, $l2, "X"x$l2);
        print;
        }'
CPI.%26Name%3dXXXXX%26Pwd%3dBOTTLE%26Name%3dXXXX

That is for two pairs like the example. N pairs in a line will be a slight modification.

dawg
  • 98,345
  • 23
  • 131
  • 206
0

You can avoid %26 doing this:

a='CPI.%26Name%3dJASON%26Pwd%3dBOTTLE%26Name%3dCOTT'
echo "$a" |sed -E ':a;s/(%3dX*)([^%X]|%[013-9a-f][0-9a-f]|%2[0-5789a-f])/\1X/g;ta;'

Note that each encoded character %xx counts for one X.

Casimir et Hippolyte
  • 88,009
  • 5
  • 94
  • 125
  • GNU sed only. POSIX it would need a change. – dawg Jan 30 '17 at 21:51
  • @dawg: something like this I suppose: `sed -e 's/%26/\&/g;' -e :a -e 's/\(%3d[^&]*\)[^&X]/\1X/g;ta;s/&/%26/g;'` (I only use GNU sed) – Casimir et Hippolyte Jan 30 '17 at 21:56
  • That doesn't work because the pattern space `a` is scoped locally to each `-e` – dawg Jan 30 '17 at 22:06
  • This will not work, because the inputlog.txt file i have, each line can have many '&' at different places... So I cannot replace %26 with & and vice-versa. – Puneet Jain Jan 30 '17 at 22:25
  • @PuneetJain: choose an other character. – Casimir et Hippolyte Jan 30 '17 at 22:30
  • No @CasimiretHippolyte. Because the input log file i have is very huge.. and I am pretty sure, the file and each lines will have all most all the characters.... Each file has a huge line... almost like 80K words in each line..etc.. – Puneet Jain Jan 30 '17 at 22:44
  • @PuneetJain: Try this other approach without to substitute `%26`: `sed -E ':a;s/(%3dX*)([^%X]|%[013-9a-f][0-9a-f]|%2[0-5789a-f])/\1X/g;ta'` (note that the advantage is that each `%xx` code is replaced by only one X since it represents only one character.). You can also use: `sed -E ':a;s/(%3dX*)[^%]/\1X/g;ta'` but it doesn't deal with eventual `%xx` other than `%26`. – Casimir et Hippolyte Jan 30 '17 at 23:41
  • That will not work either.. because we need to see how many characters were replaced. Lets say, client sent Pwd as PASSWORD. So we will replace it with XXXXXXXX and not with X because this way we will know that the client sent 8 characters in the password.. Pwd is just one field, there are other types of data too like Zipcode, cardnumber, socialsecurity, pin, lastname, maidenname etc etc etc – Puneet Jain Jan 30 '17 at 23:41
  • @PuneetJain: you need to understand that what you are trying to hide is encoded. For example if your client gives `a#b` as password, the result in your log will be: `a%23b` – Casimir et Hippolyte Jan 30 '17 at 23:44