1

I want to grab the entire string after first bracket after specific pattern e.g. x.set(, to the last corresponding bracket to first bracket from x.set(, even searching between lines (get as much text as needed before find corresponding ending bracket). Example string:

"ver = '1.0'
if x.set('1.2'):
    p = x.set('python_version', None)
    x = x.set('test_template', DEFAULT, p(x,b),
    z())"

The result i search for should be (using re.findall):

find_all_res  = [['1.2'],['python_version', None],['test_template', DEFAULT, p(x,b),\nz()]

Now i'm using:

re.findall(pattern="(?<![0-9a-zA-Z_])x.set([\s\S]+?)(?<=[)])(\s)", string=value)

And the result i have:

find_all_res  = [[("('1.2'):\n        p = x.set('python_version')", '\n'), ("('test_template', DEFAULT, p(x,b),\n        z())", '\n')]

UPDATE:

Last 3 cases

weis_ss
  • 61
  • 5
  • Note that balanced open/close parentheses are not a regular language. While ``re`` is more powerful than regular languages, the non-regular features are usually relatively unwieldy and some things are outright impossible. Consider to use an actual parser **if** you need to extend this with more complex rules in the future. – MisterMiyagi Oct 19 '21 at 13:30
  • Does this answer your question? [Python regex: matching a parenthesis within parenthesis](https://stackoverflow.com/questions/5357460/python-regex-matching-a-parenthesis-within-parenthesis) – MisterMiyagi Oct 19 '21 at 13:31
  • 1
    No, it should work in multiline mode. – weis_ss Oct 19 '21 at 13:37
  • Maybe i should pass example string into brackets. Entire example string is in quotes("ver (...) z())"). I read it from txt file, it does't matter what types they have cuz it is loaded as string and i don't use it in code runner as variables etc. The thing was to show everyone which pattern do i need to parse things that i need. I have added some quotes above. – weis_ss Oct 19 '21 at 13:47

1 Answers1

1

You can pip install regex to install the PyPi regex library and use

\bx\.set\((?:\s*(?:,\s*)?(?<o>[-+]?\d*\.?\d+(?:[Ee][-+]?\d+)?|\w+(?<a>\((?:[^()]++|(?&a))*\))*|'[^'\\]*(?:\\.[^'\\]*)*'|"[^"\\]*(?:\\.[^"\\]*)*"))*\s*\)

See the regex in action. Details:

  • \b - a word boundary
  • x\.set\( - x.set( string
  • (?:\s*(?:,\s*)?(?<o>[-+]?\d*\.?\d+(?:[Ee][-+]?\d+)?|\w+(?<a>\((?:[^()]++|(?&a))*\))*|'[^'\\]*(?:\\.[^'\\]*)*'|"[^"\\]*(?:\\.[^"\\]*)*"))* - zero or more occurrences of:
    • \s*(?:,\s*)? - zero or more whitespaces, and then an optional occurrence of , and zero or more whitespaces
    • (?<o> - Group "o" (it will contain all the strings you need):
      • [-+]?\d*\.?\d+(?:[Ee][-+]?\d+)?| - a number pattern, or
      • \w+(?<a>\((?:[^()]++|(?&a))*\))* - one or more word chars, and then zero or more (...) substrings with any amount of nested parentheses, or
      • '[^'\\]*(?:\\.[^'\\]*)*'| - a single quoted string literal with escape sequence support, or
      • "[^"\\]*(?:\\.[^"\\]*)*" - a double quoted string literal with escape sequence support
    • ) - end of group
  • \s* - zero or more whitespaces
  • \) - a ) char.

See a Python demo:

import regex
text = r"""ver = '1.0'
if x.set('1.2'):
    p = x.set('python_version', None)
    x = x.set('test_template', DEFAULT, p(x,b),
    z())"""
rx = r'''\bx\.set\((?:\s*(?:,\s*)?(?<o>[-+]?\d*\.?\d+(?:[Ee][-+]?\d+)?|\w+(?<a>\((?:[^()]++|(?&a))*\))*|'[^'\\]*(?:\\.[^'\\]*)*'|"[^"\\]*(?:\\.[^"\\]*)*"))*\s*\)'''
print( [x.captures("o") for x in regex.finditer(rx, text, regex.S)] )

Output:

[["'1.2'"], ["'python_version'", 'None'], ["'test_template'", 'DEFAULT', 'p(x,b)', 'z()']]
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • How to add support of f'strings as variables or variables without quotes and f'strings? After s.set. If i have x.set(f'some_string{var}.smth') it returns ['f', "'some_string{var}.smth'"] but should -> ["f'some_string{var}.smth'"] or ["'some_string{var}.smth'"]. And if x.set(variable) have variable without quotes it won't work. – weis_ss Oct 20 '21 at 09:41
  • @weis_ss I am not sure matching f-strings with regex is safe. If you parse Python code from within Python, you should think of using a dedicated library. – Wiktor Stribiżew Oct 20 '21 at 09:49
  • I parse python code from txt file, can not parse it from python code (you would have to import the entire project into one file, i do not know if it would work) – weis_ss Oct 20 '21 at 10:05
  • 1
    @weis_ss Try something like [this](https://tio.run/##dZJdS8MwFIbv@ytCLkxOW4sfd2NrEJxXIsIUhKYbnUu3gv1K4qxu@@319AP1xkJPkvM@b05O0@rT7srium2zvCq1JVptVeNY1VgyI5pSulcaZ@wyuGBOExhlecpMmauVsTortod9ok@Bye2OAfkB2D8I66AsJQOHm14xmDgEnwqLjNmqP9IKC5usLJhPHspCQU81v5RVxq6syqu3xCqEbud3N8/3Tz6peOOvwe/5Lw6APTi6M2osL9eN7PySO1xMpHEx@jiAIGjgYlqGvZGkOK/rsIthxGgMCHLxOKvhcHUC4UEkjVzEruiTNRwHG4uWTMq421bKYFyAy0aZRkv6V6aDTEc5OvdiITeuDDB6CEVzFY9JD8RIyQ@UpkkoeUcsOcSed@TiLAFwJb5IdRG7koAtOxVeguUkaoLXpLLvWhlOSwokLTV@0KwY7jxIs2KTWaW5bnzS/QD@KCyOw/gCMYG2/QY). See [regex demo](https://regex101.com/r/ivEfA7/2). – Wiktor Stribiżew Oct 20 '21 at 10:55
  • Yea, it's better now, but still it didnt find "x.set(variable_name)" – weis_ss Oct 20 '21 at 11:12
  • 1
    @weis_ss It [finds it alright](https://tio.run/##dZJdS8MwFIbv@ytCLkxOW4vOu7EtCM4rEWEKQtONzqVbYf1YEmd122@fJ21RbyzkJDnv8/bktKk/7aYqb87nvKgrbYlWa9V4VjWWjImmlO6VxhW7jq6Y10RGWb5PdZ4ut2pRpoWCPpkxUxVqYazOy/UBkVNkCrthQH4A9g/CHJRnpOOw0oDB0CP41Fi5z9btORd4GpNXJQvJY1VicUc1v5RVxi6sKuptahVCd9P725eH55DUvAmXELb8FwfAxjztjBrLy2UjnV9yj4uhND7GECcQBA1cjKpJayQZrne7iYuTmNEEEOTiabyDw@AEIoBYGjlLfNEmd3DsbCyeMykT91opo34DPutlGs/pX5l2Mu3l@DJIhFz5MsIYIBRPVdInAxA9JT9QGqUTyR0x55AEwZGLixTAlziQchG7koAtezX@BMtJ3ERvaW3ftTKcVhRIVmn8oHnZXYQoy8tVbpXmugmJuxVhL8yO3fwKCYHz@Rs). – Wiktor Stribiżew Oct 20 '21 at 11:14
  • Yes you're right, but if i parse instance e.g: instance.var it won't work. – weis_ss Oct 20 '21 at 11:25
  • 1
    @weis_ss I can hardly follow you, probably, all you ask now for is to replace `\w+` with `\w+(?:\.\w+)*` – Wiktor Stribiżew Oct 20 '21 at 11:28
  • There are two more cases which i found. https://cutt.ly/PRzDF93 Isnt there a possibility to "just" get entire data between runner.setting(*) and their corresponding ending bracket? Even multiline. – weis_ss Oct 20 '21 at 14:36
  • 1
    @weis_ss If there are string literals containing unpaired `(` or `)` then no. Else, you already have it, `(?\((?:[^()]++|(?&a))*\))` – Wiktor Stribiżew Oct 20 '21 at 14:56
  • i have updated main post with the question (added link to tio) if you could have a look, i will be nice. – weis_ss Oct 20 '21 at 15:17