1

Some time ago I needed help with a postgresql database with a jsonb field to convert numerical values surrounded by double quotes into standard numerical values and this solution worked very well: Eliminate double quotes from a json field in a selective manner in PostgreSQL

I had a field named "test_json" which is composed of for example these 2 rows:

{"test_name": "Full", "parameter1": "4.219", "parameter2": 4.4137}
{"test_name": "Full", "parameter1": "3.758", "parameter2": 4.159}

And the problem was to have these 2 lines after correction:

{"test_name": "Full", "parameter1": 4.219, "parameter2": 4.4137}
{"test_name": "Full", "parameter1": 3.758, "parameter2": 4.159}

So the proposed solution is really great...

However I have now cases where the database has been fed with a random character instead of a number in this case this conversion does not work. Example of bad datas:

{"test_name": "Full", "parameter1": "4.219", "parameter2": 4.4137}
{"test_name": "Full", "parameter1": "}", "parameter2": 4.159}

Or

{"test_name": "Full", "parameter1": "u", "parameter2": 4.4137}
{"test_name": "Full", "parameter1": "3.758", "parameter2": 4.159}

But it is hardly the same character so I struggle to find and replace them automatically…

Does anyone have an idea for a postgresql function in a json field to test if the value of the field “parameter1” can be converted to numerical and, if not, replace it with a missing value within this field? Like for example after final correction with old first line to "parameter1": "u"

{"test_name": "Full,"parameter2": 4.4137}
{"test_name": "Full", "parameter1": 3.758, "parameter2": 4.159}

Thanks a lot!!

Franck
  • 25
  • 4

1 Answers1

1

step-by-step demo:db<>fiddle

Deconstruct the JSON element, filter all numeric-like texts and reconstruct the resulting elements:

SELECT
    id,
    jsonb_object_agg(key, value::numeric)    -- 3
       ||  jsonb_build_object('test_name', test_json -> 'test_name') as test_json  -- 4
FROM
    t_test,
    jsonb_each_text(test_json) as objects    -- 1
WHERE value ~ '^[0-9\.]+$'                   -- 2
GROUP BY id, test_json
  1. Deconstruction: All attributes to one row
  2. Filter all values that look like numbers
  3. Reaggregate the JSON object
  4. Add the "test_name" attribute manually, because this attribute was removed by the filter before.

The UPDATE statement:

UPDATE t_test t
SET test_json = s.test_json
FROM (
    -- <query above>
) s
WHERE t.id = s.id;

Edit: 2nd Version using regexp_replace():

  1. Step: Eliminate all non-numeric occurrences text-based using a RegExp:

demo:db<>fiddle

SELECT 
    regexp_replace(
        test_json::text, 
        '"((parameter1|parameter2)": "(?:(?![0-9\.]+).)+?)"', '', 'g'
    )
FROM
    t_test

This uses the "negative lookbehind" solution from this answer: https://stackoverflow.com/a/977294/3984221

  1. Step: Eliminate the " characters with your former approach.
S-Man
  • 22,521
  • 7
  • 40
  • 63
  • Thank you! The problem with deconstruction is that this json field is made up of about a hundred parameters. I've put only a small part here to illustrate... sorry about that – Franck Nov 20 '20 at 08:36
  • Added a second approach. – S-Man Nov 20 '20 at 09:46
  • Thanks! Unfortunately it doesn't seem to work? I still have: "hysteresis": "}", "max_residuals": 1.876657 for the row that I want to correct for my try – Franck Nov 20 '20 at 10:35
  • Could you please add your test cases into the fiddle I provided in the answer. Then I could have closer look at this. – S-Man Nov 20 '20 at 10:39
  • Thanks, you are right: it should work. I will try again checking any typo in my code on the real database... – Franck Nov 20 '20 at 11:12
  • OK so it seems that as soon as I use the real name of the parameter with _ and () in it, it doesn't work anymore. Sorry I am not use to regex function to understand how to correct your code to make it works. I have update the fiddle... thanks again – Franck Nov 20 '20 at 12:04
  • Hi, when you updated the fiddle, you must "run" your new version. This generates a new link with the updated data. Please send the new link :) – S-Man Nov 20 '20 at 12:43
  • Yes sorry it is a RTFM issue! ^^ Here is the new link https://dbfiddle.uk/?rdbms=postgres_13&fiddle=bf0e713fe3f9e0326bde14e7d89d811c – Franck Nov 20 '20 at 13:10
  • 1
    It's the parentheses. You have to escape them in the RegExp: https://dbfiddle.uk/?rdbms=postgres_13&fiddle=f49ee3d70ef35324397ebfe0d72a3e11 – S-Man Nov 20 '20 at 15:00
  • Great! Thank you very very much it saves me several hours of database cleaning! – Franck Nov 20 '20 at 15:11
  • just one last little question: when I run the regexp_replace function on my real database, dbeaver displays the json field which is well modified: great! But when I go then to see the field with a select from... the field is still at the old value. Isn't regexp_replace supposed to update the json field? Do I have to use another command after that to do this update? Sorry if my question is a low level one I'm a complete newbie in postgresql and database administration... Thank you! The fiddle to illustrate: https://dbfiddle.uk/?rdbms=postgres_11&fiddle=10c753788683981ca5595e4e65253194 – Franck Nov 20 '20 at 17:28
  • 1
    Yes, of course, you need to run an UPDATE command. See my first example and your example how to do this. – S-Man Nov 20 '20 at 17:53