0

I have been trying to extract the SQL query in a multi-line text but all the time I am getting wrong output.

How to get the text between one or three quotes?

Note: there can be anything before and after first complete quotes '', "", """""", '''''' and I am only interested finding the first text between the quotes.

import re

cell_text = """\
#%%sql
q = \"\"\"
select 
name, breed, sum(weight) over (partition by breed order by name) as running_total_weight
from cats 
order by breed, name
\"\"\"

f(q)
"""
print(cell_text)

My attempt:

pat = """.*select(.*)['"].*"""
out = re.findall(pat,cell_text,flags=re.M)[0]
sql = 'select ' + out
print(sql)

# I am getting empty outputs for re.findall instead of text there.

Required output:

input
----

#%%sql
q = """
select 
name, breed, sum(weight) over (partition by breed order by name) as running_total_weight
from cats 
order by breed, name
"""

f(q)

output
------

select 
name, breed, sum(weight) over (partition by breed order by name) as running_total_weight
from cats 
order by breed, name


input
-----
#%%sql
q = "select * from cats;"

f(q)

output
-------
select * from cats;

input
-----
q = 'select * from cats limit 2'

output
------
select * from cats limit 2
halfer
  • 19,824
  • 17
  • 99
  • 186
BhishanPoudel
  • 15,974
  • 21
  • 108
  • 169

1 Answers1

4

You need to use DOTALL or (?s) mode like this:

>>> print (re.findall(r'(?s)"""(.*?)"""', cell_text)[0])

select
name, breed, sum(weight) over (partition by breed order by name) as running_total_weight
from cats
order by breed, name

You could also use flags parameter in re.findall:

re.findall(r'"""(.*?)"""', cell_text, flags=re.DOTALL)

Edit: Note that to match all single or triple quoted text you may use this regex with alternation:

r"""\"\"\"(.*?)\"\"\"|'''(.*?)'''|"(.*?)"|'(.*?)'"""

RegEx Demo

anubhava
  • 761,203
  • 64
  • 569
  • 643
  • I tried `re.findall(pat,cell_text,flags=re.MULTILINE|re.DOTALL)[0]` but get two extra quotes `""` in the end, how to avoid that extra trailing quotes? – BhishanPoudel Jul 17 '20 at 18:05
  • @astro123 To get the first match, use `re.search`. `re.findall` returns multiple matches. – Wiktor Stribiżew Jul 17 '20 at 18:06
  • Pattern CAN NOT be `pat=r'"""(.*?)"""'` since the query can be `q = "myquery"`, `q = """ myquery"""` i.e. anything between single and triple quotes. not only triple double quotes. – BhishanPoudel Jul 17 '20 at 18:12
  • It is a fix for the code you've shown in question where you are using `pat = """.*select(.*)['"].*"""`. Please provide some sample for single quote matching also – anubhava Jul 17 '20 at 18:16
  • To match all these case you may use this regex: `r"""\"\"\"(.+?)\"\"\"|'''(.+?)'''|"(.+?)"|'(.+?)'"""` – anubhava Jul 17 '20 at 18:22