0

I'm trying to find a regex expression in python that will be able to handle most of the UPDATE queries that I throw at if from my DB. I can't use sqlparse or any other libraries that may be useful with for this, I can only use python's built-in modules or cx_Oracle, in case it has a method I'm not aware of that could do something like this.

Most update queries look like this:

UPDATE TABLE_NAME SET COLUMN_NAME=2, OTHER_COLUMN=to_date('31-DEC-202023:59:59','DD-MON-YYYYHH24:MI:SS'), COLUMN_STRING='Hello, thanks for your help', UPDATED_BY=-100 WHERE CODE=9999;

Most update queries I use have a version of these types of updates. The output has to be a list including each separate SQL keyword (UPDATE, SET, WHERE), each separate update statement(i.e COLUMN_NAME=2) and the final identifier (CODE=9999).

Ideally, the result would look something like this:

list = ['UPDATE', 'TABLE_NAME', 'SET', 'COLUMN_NAME=2', 'OTHER_COLUMN=("31-DEC-2020 23:59:59","DD-MON-YYYY HH24:MI:SS")', COLUMN_STRING='Hello, thanks for your help', 'UPDATED_BY=-100', 'WHERE', 'CODE=9999']

Initially I tried doing this using a string.split() splitting on the spaces, but when dealing with one of my slightly more complex queries like the one above, the split method doesn't deal well with string updates such as the one I'm trying to make in COLUMN_STRING or those in OTHER_COLUMN due to the blank spaces in those updates.

  • `.split(",(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)")` will do for you as explained here : [answer](https://stackoverflow.com/a/18893443/14872512) – Younes Feb 18 '21 at 13:30
  • @Younes thanks for your reply. Unfortunately this regex doesn't deal well with the date values or string updates that have a comma in them as it seems to split a string on the commas. Also, I doesn't split the string at the UPDATE, SET, WHERE, keywords. This means that I end up with a ```UPDATE TABLE_NAME SET COLUMN_NAME=2``` in the 0 index of my list, where I would like my 0th index to be UPDATE, 1st index to be TABLE_NAME, etc. – Aarón González Feb 18 '21 at 13:47
  • ah i get it, i will post a workaround answer that will get you what you want i hope. – Younes Feb 18 '21 at 14:29

1 Answers1

0

Let's use the shlex module :

import shlex
test="UPDATE TABLE_NAME SET COLUMN_NAME=2, OTHER_COLUMN=to_date('31-DEC-202023:59:59','DD-MON-YYYYHH24:MI:SS'), COLUMN_STRING='Hello, thanks for your help', UPDATED_BY=-100 WHERE CODE=9999;"

t=shlex.split(test)

Up to here, we won't get rid of comma delimiters and the last semi one, so maybe we can do this :

for i in t:
    if i[-1] in [',',';']:
        i=i[:-1]

If we print every element of that list we'll get :

UPDATE
TABLE_NAME
SET
COLUMN_NAME=2
OTHER_COLUMN=to_date(31-DEC-202023:59:59,DD-MON-YYYYHH24:MI:SS)
COLUMN_STRING=Hello, thanks for your help
UPDATED_BY=-100
WHERE
CODE=9999

Not a proper generic answer, but serves the purpose i hope.

Younes
  • 391
  • 2
  • 9