0

I have a text file which contains multiple SQL queries that start and end with """ or '''. I am trying to create a regex pattern to capture all such occurrences and extract all the SQL query content between the triple quotes. Below is what I have tried so far using Regex101.com. The problem is, it is finding only the very first occurrence. How can I modify my code to find all matching occurrences?

Below is my code. I am using Python 3.6.

# Example content from the text file
data = """
'''test''',
..................
..................
example text here('''SELECT * FROM table''').format(),
..................
..................
"""

# Creating regex pattern
regex = re.compile(r"(?<=([\"']{3}\b))(?:(?=(\\?))\2.)*?(?=\1)")

# Searching for patterns
pattern = regex.search(data)

# Printing all patterns
if pattern:
    print(pattern.group()) # prints only 'test'

The expected output is as provided below:

[test, SELECT * FROM table]

UPDATE: I modified my regex pattern to ((?:'''|\"\"\")\b)(?:(?=(\\?))\2.)*?(?=\1) and it works for the above two cases. However, I also have multiline patterns for which the code doesn't work. Below are couple of samples for reference. I looked at some of the already asked questions here and here. But I am unable to figure out how to reconstruct my pattern to capture both single line and multi-line patterns. Any help on this would be appreciated as I am completely new to regex.

"""
SELECT * from table
WHERE

A = B

"""

"""SELECT VALUES
FROM table
WHERE score = 0"""
Code_Sipra
  • 1,571
  • 4
  • 19
  • 38
  • try `re.findall(pattern, string)` – Mohammad Zeineldeen Sep 23 '19 at 18:51
  • Try the `re.findall()` function, rather than `.search()`. – S3DEV Sep 23 '19 at 18:51
  • Look into `re.finditer()` – Jan Sep 23 '19 at 18:52
  • I have already tried this - `regex = re.findall(r"(?<=([\"']{3}\b))(?:(?=(\\?))\2.)*?(?=\1)", data)`. But I get an output like this - `[("'''", ''), ("'''", '')]`. The only thing which worked was ~.search()~ – Code_Sipra Sep 23 '19 at 18:53
  • 1
    Ditch the `[\"']{3}`, it matches things you don't want (combinations of three *different* quotes). Just use `(?:'''|\"\"\")`. – hobbs Sep 23 '19 at 18:54
  • 1
    Use `re.finditer`. `[x.group() for x in re.finditer(your_regex, text)]` – Wiktor Stribiżew Sep 23 '19 at 18:55
  • Do you know anything more about the structure of the file you're searching through? For example, is it a syntactically valid Python file? If so, you might have an easier time extracting string literals by using the [`tokenize`](https://docs.python.org/3/library/tokenize.html) module. – Kevin Sep 23 '19 at 18:55
  • 1
    @WiktorStribiżew, re.finditer() worked! Thanks a lot. – Code_Sipra Sep 23 '19 at 19:03
  • @hobbs, I tried it. It gives me this error: `invalid group reference 2'. Here is my regex: `"(?:'''|\"\"\"\b)(?:(?=(\\?))\2.)*?(?=\1)"`. I am not very confident with creating regex patterns. – Code_Sipra Sep 23 '19 at 19:08
  • 1
    @Code_Sipra no, `((?:'''|\"\"\")\b)` etc. – hobbs Sep 23 '19 at 19:48

1 Answers1

-1

Probably you're looking for re.findall()

gabe appleton
  • 368
  • 2
  • 10