I have a regular expression that I have verified works correctly, proof is here: https://regex101.com/r/ffSVuD/6
Unfortunately when I use the same regex within some Python code I do not get the same behaviour. The regex does get a match, but it does not find the same match groups.
Here is some demo code:
import re
ddl_string = """
CREATE TABLE default.test_parse_partitioned_table__using_parquet_1_082921496561 (DATA4 BIGINT, DATA5 BIGINT, DATA2 BIGINT, DATA3 BIGINT)
USING parquet
OPTIONS (
serialization.format \\'1\\'
)
PARTITIONED BY (DATA2, DATA3)
"""
regex = r'CREATE +?(TEMPORARY +)?TABLE *(?P<db>.*?\.)?(?P<table>.*?)\((?P<col>.*?)\).*?USING +([^\s]+)( +OPTIONS *\([^)]+\))?( *PARTITIONED BY \((?P<pcol>.*?)\))?'
match = re.search(regex, ddl_string, re.MULTILINE | re.DOTALL)
if match.group("pcol"):
print match.group("pcol").strip()
else:
print 'did not find any pcols in {matches}'.format(matches=match.groups())
which returns:
did not find any pcols in (None, 'default.', 'test_parse_partitioned_table__using_parquet_1_082921496561 ', 'DATA4 BIGINT, DATA5 BIGINT, DATA2 BIGINT, DATA3 BIGINT', 'parquet', None, None, None)
My intention is to populate DATA2, DATA3 into match.group("pcol")
but as you will observe that is not happening. In my aforementioned regex verification at https://regex101.com/r/ffSVuD/6 it does find a match:
I have fiddled around quite a lot trying to get a regex that will return what I need but no success hence this post. Can anyone help?