0

Hey I have this script that replaces occurances of a pattern inside files.

gawk -v RS='<[^>. ]+>' '{ ORS="" }  
    RT {                 
       switch (RT)
       {  
            case /E:/:
                if (!(RT in events))                    
                    events[RT] = eventCount++   
                name=RT    
                sub(/<E:/, "", name)
                sub(/>/, "", name)
                ORS=name ", " events[RT]
                break
           ...
       }                
    }
    {
       print $0 > (FILENAME ".c")
    }' $files

I need to modify the script in a way so that it only replaces the patterns if they are found inside the function definitions of the functions that I specify in a variable. For example:

gawk -v RS='<[^>. ]+>' -v FUNCS='a(void) b(void) foobar(void)''{ ORS="" }  
    RT {    
       #if inside of one of FUNCS                                         
       switch (RT)
       {
            case /E:/:
                if (!(RT in events))                    
                    events[RT] = eventCount++   
                name=RT    
                sub(/<E:/, "", name)
                sub(/>/, "", name)
                ORS=name ", " events[RT]
                break
           ...
       }                
    }
    {
       print $0 > (FILENAME ".c")
    }' foo.c bar.c

foo.c:

void a(void)
{
    <E:X>
}

void b(void)
{
    <E:Y>
}

void barfoo(void)
{
    <E:Z>
}

bar.c:

void c(void)
{
    <E:A>
}

void foo_bar(void)
{
    <A:B>
}

after running the script the files should look like this:

foo.c:

void a(void)
{
    0
}

void b(void)
{
    1
}

void barfoo(void)
{
    <E:Z>
}

bar.c:

void c(void)
{
    <E:A>
}

void foo_bar(void)
{
    2
}

Edit: I have a problem with the current solution because it doesnt work in some functions. The example code I am testing against:

test.c

void foobar_(void)
{
    <E:X>
}


void foobar(void)
{
    <E:X>
}

test.c.tmp

void foobar_(void)
{
    <E:X>
}


void foobar(void)
{
    <0>
}

and the code that I run:

awk -v funcs='foobar(void) foobar_(void)' '
BEGIN {
    split(funcs,tmp)
    for (i in tmp) {
        fnames[tmp[i]]
    }
}
/^[[:space:]]*[[:alnum:]_]+[[:space:]]*[[:alnum:]]+\([^)]*)/ {
    inFunc = ($NF in fnames ? 1 : 0)
}
{
    head = ""
    tail = $0
    while ( inFunc && match(tail,/<E:[^>]+>/) ) {
        tgt = substr(tail,RSTART+1,RLENGTH-2)
        if ( !(tgt in map) ) {
            map[tgt] = cnt++
        }
        head = head substr(tail,1,RSTART) map[tgt]
        tail = substr(tail,RSTART+RLENGTH-1)
    }
    $0 = head tail
}
{
    print $0 > (FILENAME ".tmp")
}' $module_files
  • Thanks for showing your efforts in your question. Could you please do add sample of input and expected output more clearly so that we can understand question in better manner, thank you. – RavinderSingh13 Jan 28 '21 at 10:37
  • how about now ? – Earl of Lemongrab Jan 28 '21 at 10:46
  • Your script doesn't replace "patterns", it replaces strings that match a regexp using partial matching . See https://stackoverflow.com/questions/65621325/how-do-i-find-the-text-that-matches-a-pattern and then replace the word "pattern" with string vs regexp and full vs partial everywhere you use it. – Ed Morton Jan 28 '21 at 12:37

1 Answers1

3

With GNU awk (which you're already using) for the 3rd arg to match() and \s/\w shorthand:

$ cat tst.awk
BEGIN {
    split(funcs,tmp)
    for (i in tmp) {
        fnames[tmp[i]]
    }
}
/^\s*\w+\s*\w+\([^)]*)/ {
    inFunc = ($NF in fnames ? 1 : 0)
}
inFunc && match($0,/(.*)(<[^>]+>)(.*)/,a) {
    $0 = a[1] (cnt++) a[3]
}
{ print }

or with any awk:

$ cat tst.awk
BEGIN {
    split(funcs,tmp)
    for (i in tmp) {
        fnames[tmp[i]]
    }
}
/^[[:space:]]*[[:alnum:]_]+[[:space:]]*[[:alnum:]_]+\([^)]*)/ {
    inFunc = ($NF in fnames ? 1 : 0)
}
inFunc && match($0,/<[^>]+>/) {
    $0 = substr($0,1,RSTART-1) (cnt++) substr($0,RSTART+RLENGTH+1)
}
{ print }

$ awk -v funcs='a(void) b(void) foobar(void)' -f tst.awk foo.c
void a(void)
{
    0
}

void b(void)
{
    1
}

void barfoo(void)
{
    <E:Z>
}

To have the count keep incrementing across input files use gawk -i inplace to run on multiple files at once updating them as you go or print to temp files as you go and then move them when the script is done to replace the originals or similar.


EDIT: showing the above work for a function name that includes an underscore:

$ cat bar.c
void c(void)
{
    <E:A>
}

void foo_bar(void)
{
    <A:B>
}

$ awk -f tst.awk -v funcs='foo_bar(void)' bar.c
void c(void)
{
    <E:A>
}

void foo_bar(void)
{
    0
}
Ed Morton
  • 188,023
  • 17
  • 78
  • 185
  • Thank you, I have some problems though; if the name of the function contains a underscore it wont work. And there can be multiple regexes in one line, currently it will only replace the first one – Earl of Lemongrab Jan 28 '21 at 14:08
  • 1
    Why would it not work if the name of the function contained an underscore? Did you try it? It's trivial to deal with multiple possible matches on a line as all we need to do is call match() in a loop but all we have to go on when trying to help you is the sample input/output you provide - if your real data can contain multiple strings to be matched on a single line then show that in the example in your question so we can see it and have something to test against. – Ed Morton Jan 28 '21 at 14:57
  • I see now that I already answered that part of this question for you previously - https://stackoverflow.com/a/65026705/1745001. – Ed Morton Jan 28 '21 at 15:22
  • Ah yes, I only had the other answer there in mind, thank you it works. Just add an underscore to any of the function names and it wont work. – Earl of Lemongrab Jan 28 '21 at 16:41
  • It **will** work, an underscore is a word-constituent character and so is included in `\w` as used in the gawk version I posted and I explicitly included it in the equivalent bracket expressions I used in the non-gawk version `[[:alnum:]_]`. Please do try it. – Ed Morton Jan 28 '21 at 16:44
  • I tried it multiple times. The function name foo_bar wont work, the name foobar will work – Earl of Lemongrab Jan 28 '21 at 16:58
  • Then you're doing something wrong. Maybe you're using some character that looks like an underscore but isn't an underscore in your testing or maybe you're puting an underscore in the input but not in the list on the command line or vice-versa, idk. If you update your sample input/output to include that case I'll run my script against it and show the output. – Ed Morton Jan 28 '21 at 17:02
  • You added `bar_foo(void)` to foo.c in your sample input and showed that you don't expect it to be modified in your expected output and it won't because `bar_foo(void)` isn't in the `funcs` list passed to awk. If your bar.c doesn't change as you want then, again, you're doing something wrong (e.g. not including `foo_bar(void)` in the `funcs` list passed to awk) as there is nothing magic about an underscore. I edited the bottom of my question to show an example with an underscore working. – Ed Morton Jan 28 '21 at 17:12
  • Are you expecting `foobar` in the `funcs` list to match `foo_bar` in your input? If so are you asking for `_` to be treated as some kind of wildcard that can appear anywhere in the function name? If so then please update your question to clarify your requirements around that. – Ed Morton Jan 28 '21 at 17:14
  • I updated the post to show what the problem is an what code I run – Earl of Lemongrab Jan 29 '21 at 09:06
  • 1
    You changed the regexp that matches the function name line in my answer to remove `_` from the bracket expression. I used `[[:alnum:]_]` and you're using `[[:alnum:]]`. – Ed Morton Jan 29 '21 at 13:05