0

I am trying to match c function names using a regex, that works just fine in quite large/complex files. However it gets stuck in one specific line, and goes into an infinite loop.

regex:

func_re = re.compile("^" +"((?P<line_num>\d+)" + "  END_LINE_NUM" + ")*\s*(static)*\s*[a-z0-9_]+\s*\**\s+(?P<func_name>[a-z0-9_]+)\s*\((?P<func_arg>\s*\s*[a-z0-9_]+\**\s*(?:[a-z0-9_]+)\s*,?)*\s*\)\s*{?\s*$", re.MULTILINE | re.I | re.S )
for m in re.finditer(func_re, string):
        print("found...")

string contains the following:

1  END_LINE_NUM            ( this_is_a_test_function_exact_size( p_point_er->member ) == FALSE ) &&
2  END_LINE_NUM

Note: I am adding this number and identifier at the begining of line to be able to get the line numbers where functions start

Shadi
  • 123
  • 7

1 Answers1

0

as indicated by Wiktor, its catastrophic backtracking.

this simple fix helped (removing the * marked with --> <--)

func_re = re.compile("^" +"((?P<line_num>\d+)" + "  END_LINE_NUM" + ")--->*<---\s*(static)*\s*[a-z0-9_]+\s*\**\s+(?P<func_name>[a-z0-9_]+)\s*\((?P<func_arg>\s*\s*[a-z0-9_]+\**\s*(?:[a-z0-9_]+)\s*,?)*\s*\)\s*{?\s*$", re.MULTILINE | re.I | re.S )
Shadi
  • 123
  • 7