10

I tried to test the performance of capturing and non-capturing group of the regex. By the way, there is very slightly different between the capturing group and the non-capturing group. Is this result normal?

[root@Sensor ~]# ll -h sample.log
-rw-r--r-- 1 root root 21M Oct 20 23:01 sample.log

[root@Sensor ~]# time grep -ciP '(get|post).*' sample.log
20000

real    0m0.083s
user    0m0.070s
sys     0m0.010s

[root@Sensor ~]# time grep -ciP '(?:get|post).*' sample.log
20000

real    0m0.083s
user    0m0.077s
sys     0m0.004s
anubhava
  • 761,203
  • 64
  • 569
  • 643
Mr.kang
  • 587
  • 6
  • 17

2 Answers2

13

Typically, non-capturing groups perform better than capturing groups, because they require less allocation of memory, and do not make a copy of the group match. However, there are three important caveats:

  • The difference is typically very small for simple, short expressions with short matches.
  • The act of starting a program like grep itself takes a significant amount of time and memory, and may overwhelm any small improvement gained by using non-capturing group(s).
  • Some languages implement capturing and non-capturing groups in the same way, causing the latter to give no performance improvement.
Pi Marillion
  • 4,465
  • 1
  • 19
  • 20
  • Since you mention the load-time of the binary, you might want to mention that file I/O is fairly slow when compared to CPU speed as well. Hopefully, the whole file `sample.log` was in the OS's I/O cache during both of the invocations of `grep`. – Christopher Schultz Jun 19 '20 at 11:14
4

If use a lot of the capturing group. The difference seems to be more.

Thanks everyone.:)

[root@Sensor ~]# time grep -ciP "(get|post)\s[^\s]+" sample.log
20000

real    0m0.057s
user    0m0.051s
sys     0m0.005s
[root@Sensor ~]# time grep -ciP "(?:get|post)\s[^\s]+" sample.log
20000

real    0m0.061s
user    0m0.053s
sys     0m0.006s
[root@Sensor ~]# time grep -ciP "(get|post)\s[^\s]+(get|post)" sample.log
1880

real    0m0.839s
user    0m0.833s
sys     0m0.005s
[root@Sensor ~]# time grep -ciP "(?:get|post)\s[^\s]+(?:get|post)" sample.log
1880

real    0m0.744s
user    0m0.741s
sys     0m0.003s
Mariano
  • 6,423
  • 4
  • 31
  • 47
Mr.kang
  • 587
  • 6
  • 17