1

I am making a mini-parser of sql to estimate the maximum length of the value that an operation or a function will return. Ex: round (column, 2). For that, I am using regular expressions. For the example I gave, I got the regular expression round\((\w+)(,\s*(\d+))?\).

However, I came across these cases

column1||column2||column3||... columnn
concat(column1, column2, ... columnn)

I tried for the first case (although I knew it wouldn't work), with regex like:

(\w+\|\|\w+)+
(\w+\|\|\w+\|\|)|(\|\|\w+\|\|\w+)

What regex do you propose to match the above cases? Or rather a more general question: How could I know if n strings are joined with a specific string?

Dante S.
  • 222
  • 3
  • 16
  • Don't use a regex to parse sql. Or any other non-regular language. At least, don't do it if you are going to run this on sql you haven't generated yourself... – 2e0byo Oct 02 '21 at 16:54
  • Don't worry, it's for a personal module that I'm making and I'm going to create the regex for specific cases. But thanks anyway for the advice! I will keep that in mind in the future! – Dante S. Oct 02 '21 at 16:57
  • Obligatory link to this [tangentially related](https://stackoverflow.com/a/1732454/15452601) question – 2e0byo Oct 02 '21 at 16:57
  • What strange answers, next to what the google translator shows I did not understand almost anything. But it is true that it can have certain flaws, regex is not perfect. And there may be sql injections. But I will make sure that the only one who enters that data will be me! – Dante S. Oct 02 '21 at 17:04
  • It's a very funny but somewhat over used answer. there's even a [meta discussion](https://meta.stackoverflow.com/questions/252385/why-do-parsing-html-with-regex-questions-come-up-so-often) about it. But the takeway is: don't expect to build a *robust* parser with regexs: use a real parser instead. Sometimes regexs are fine however. – 2e0byo Oct 02 '21 at 17:06

1 Answers1

2

What regex do you propose to match the above cases?

To match column1||column2||column3||... column10 use (column\d||)+ regex.

>>> import re
>>> m = re.match("(column\d(\|\|)?)+","column1||column2||column3||column4")
>>> m.group(0)
'column1||column2||column3||column4'

Use similar regex for the second case.

rok
  • 9,403
  • 17
  • 70
  • 126
  • 1
    Wow, it worked! I don't know how it didn't occur to me ... I created the regex (\ w + (\ | \ |)?) + From your idea and I managed to create the other one as you proposed! Thanks, I don't really know how I could be so stupid! – Dante S. Oct 02 '21 at 17:11