0

Why on applying regular expression(rx) on data(d) gives output(o) ?
Regular expression (rx):

s/(?<!\#include)[\s]*\<[\s]*([^\s\>]*)[\s]*\>/\<$1\>/g

Data (d):

#include  <a.h>  // 2 spaces after e

output (o):

#include <a.h>  // 1 space is still there

Expected output is:

#include<a.h>  // no space after include

ikegami
  • 367,544
  • 15
  • 269
  • 518
  • 1
    tip: `[\s]` is pointless. `[]` is for grouping MULTIPLE characters into a single match point. `[\s]*` is functionally identical to `\s*`. – Marc B Jul 31 '13 at 15:15

3 Answers3

6

The condition (?<!\#include) is true as soon as you've passed the first of the two spaces, therefore the match starts there.

#include  <a.h>
         ^^^^^^- matched by your regex.

That means the space is not removed by your replace operation.

If you use a positive lookbehind assertion instead, you get the desired result:

s/(?<=#include)\s*<\s*([^\s>]*)\s*>/<$1>/g;

which can be rewritten to use the more efficient \K:

s/#include\K\s*<\s*([^\s>]*)\s*>/<$1>/g;
Community
  • 1
  • 1
Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561
2

?<!\#include)[\s] is a space that is not directly preceded by #include. The first space in #include <a.h> is directly preceded by #include, so it isn't matched. The second one isn't (it's preceded by the other space), so that's where the match starts.

sepp2k
  • 363,768
  • 54
  • 674
  • 675
0

As an aside comment, you can use this pattern which doesn't use the lookbehind:

s/(?:#include\K|\G)(?:\s+|(<|[^\s><]+))/$1/g

pattern details:

(?:              # open a non capturing group
    #include\K   # match "#include" and reset it from the match result
  |              # OR
    \G           # a contiguous match
)                # close the non capturing group
(?:          
    \s+          # white characters (spaces or tabs here) 
  |              # OR
    (            # capturing group
        <
      |
        [^\s><]+ # content inside brackets except spaces (and brackets)
    )
)

The search stop at the closing bracket since it is not describe in the pattern and since there is no more contiguous matches until the next #include.

Casimir et Hippolyte
  • 88,009
  • 5
  • 94
  • 125