0

I need a regex that matches to both of these strings:

CREATE TEMPORARY TABLE db.table (cols)USING parquet PARTITIONED BY (DATA2, DATA3)   

CREATE TABLE db.table (cols)USING parquet

The closest I've got is this:

CREATE +?(TEMPORARY +)?TABLE *(?P<db>.*?\.)?(?P<table>.*?)\((?P<col>.*?)\).*?USING.*?(PARTITIONED BY \((?P<pcol>.*?)\))

But that doesn't match to the second string. I've tried using a ? on the end but that didn't help. basically I've been playing around with this for hours now and can't figure it out, so I'm resorting to SO.

I've set up a demo of this here: https://regex101.com/r/ffSVuD/1 If anyone feels game enough to try and solve it, be my guest!

Zenoo
  • 12,670
  • 4
  • 45
  • 69
jamiet
  • 10,501
  • 14
  • 80
  • 159

1 Answers1

1

I ended up using CREATE +?(TEMPORARY +)?TABLE *(?P<db>.*?\.)?(?P<table>.*?)\((?P<col>.*?)\).*?USING +([^\s]+) *(PARTITIONED BY \((?P<pcol>.*?)\))? to match both your examples.

Basically, I replaced USING.*? by USING +(\[^\s\]+) *, so that you don't end up with a .*? before your last group.

Finally, I added a ? after your last group to make it optional.

Zenoo
  • 12,670
  • 4
  • 45
  • 69
  • fantastic, thank you @Zenoo. I figured the problem was something to do with that word "parquet" but couldn't figure out why, thx for figuring it out. Still don't understand why it works tho. What do you mean by "so that you don't end up with a .*? before your last group"? Why is that a problem? – jamiet Mar 09 '18 at 06:45
  • 1
    @jamiet The `.*?` before your last group was looking for any character X times, without the `?` at the end of your Regex, it was working fine since you told the RegEx you wanted something very specific after. But as soon as you add the last `?`, `.*?` is now capturing everything after `USING`, the last group can never be reached. – Zenoo Mar 09 '18 at 07:57
  • Thank you again. If you're still up for the challenge then the problem has increased in complexity slightly and I am still incapable of figuring it out. We now have extra optional text between the "USING parquet" & "PARTITIONED BY" https://regex101.com/r/ffSVuD/4 – jamiet Mar 09 '18 at 08:02
  • @jamiet Alright, let me wrap my head around this one, I'll get back to you when I figure it out. – Zenoo Mar 09 '18 at 08:03
  • 1
    @jamiet If you don't really care about what's inside the `OPTIONS`, [`CREATE +(TEMPORARY +)?TABLE *(?P.*?\.)?(?P.*?)\((?P.*?)\).*?USING +([^\s]+)( +OPTIONS *\([^)]+\))?( *PARTITIONED BY \((?P.*?)\))?`](https://regex101.com/r/ffSVuD/5) should work.
    – Zenoo Mar 09 '18 at 08:08
  • fantastic, can't thank you enough. I can't wait for the day when I can properly wrap my head around this stuff. – jamiet Mar 09 '18 at 08:27
  • No problem. With RegExs, just go through your expressions step by step, trying to understand why this works or doesn't. You should be able to solve any RegEx in no time ! – Zenoo Mar 09 '18 at 08:29
  • Well the problem I have now is that while i works at http://regex101.com, when I plug the same regex into my code (python) it doesn't return a match for ``. I'll get a repro together and post another thread. Thanks again. – jamiet Mar 09 '18 at 08:36
  • @jamiet Good luck solving your issue. Can't help you with `Python`, sorry. – Zenoo Mar 09 '18 at 08:39
  • On the off-chance that anyone surfs in here and has the same issue as me, I've posted about it here: https://stackoverflow.com/questions/49191546/inconsistent-regex-behaviour-in-python – jamiet Mar 09 '18 at 10:27