I have a text file which contains multiple SQL queries that start and end with """
or '''
. I am trying to create a regex pattern to capture all such occurrences and extract all the SQL query content between the triple quotes. Below is what I have tried so far using Regex101.com. The problem is, it is finding only the very first occurrence. How can I modify my code to find all matching occurrences?
Below is my code. I am using Python 3.6.
# Example content from the text file
data = """
'''test''',
..................
..................
example text here('''SELECT * FROM table''').format(),
..................
..................
"""
# Creating regex pattern
regex = re.compile(r"(?<=([\"']{3}\b))(?:(?=(\\?))\2.)*?(?=\1)")
# Searching for patterns
pattern = regex.search(data)
# Printing all patterns
if pattern:
print(pattern.group()) # prints only 'test'
The expected output is as provided below:
[test, SELECT * FROM table]
UPDATE:
I modified my regex pattern to ((?:'''|\"\"\")\b)(?:(?=(\\?))\2.)*?(?=\1)
and it works for the above two cases. However, I also have multiline patterns for which the code doesn't work. Below are couple of samples for reference. I looked at some of the already asked questions here and here. But I am unable to figure out how to reconstruct my pattern to capture both single line and multi-line patterns. Any help on this would be appreciated as I am completely new to regex.
"""
SELECT * from table
WHERE
A = B
"""
"""SELECT VALUES
FROM table
WHERE score = 0"""