Clean string in postgresql

Question

I have a column in a table that contains data about any updates related to changes about a company in the below format -

#=============#==============#================#
| Company ID  |  updated_at  |   updates      |
#=============#==============#================#
| 101         | 2020-11-01   | name:          |
|             |              | -ABC           |
|             |              | -XYZ           |
|             |              | url:           |
|             |              | -www.abc.com   |
|             |              | -www.xyz.com   |
+-------------+--------------+----------------+
| 109         | 2020-10-20   | rating:        |
|             |              | -4.5           |
|             |              | -4.0           |
+-------------+--------------+----------------+

As you can see above, the column updates contains strings that include newlines and describe one or multiple updates. In the above example this means that for company ID 101, the name changed from ABC to XYZ and the url changed from www.abc.com to www.xyz.com. For company ID 109, only the rating changed from 4.5 to 4.0.

However I would like to divide the updates column into 3 columns - one should contain what was changed (url, name etc.), second should have the old value and the 3rd column should have the new value. Something like this -

#============#============#==============#================#
| Company ID |   Field    |  Old Value   |   New Value    |
#============#============#==============#================#
| 101        |   name     | ABC          | XYZ            |
+------------+------------+--------------+----------------+
| 101        |   url      | www.abc.com  | www.xyz.com    |
+------------+------------+--------------+----------------+
| 109        |   rating   | 4.5          | 4.0            |
+------------+------------+--------------+----------------+

I am doing this in Postgres and know how to extract substrings based on characters but this looks a bit complicated to me since I need to extract multiple substrings from the same column for each row. Any help would be appreciated. Thanks!

Is that a single column value with newline characters or are these several table entries? Please describe that table in more detail. — Laurenz Albe, Nov 04 '20 at 08:13
@LaurenzAlbe you're right. That is a single row that has all the updates that happened for a particular company on a particular date. I have updated the OP to give more context around what the present table looks like. — Abhigyan Sarma, Nov 04 '20 at 09:01
Could you specify the DDL of the table? Especially the updates column is interesting. — Philipp Johannis, Nov 04 '20 at 09:34

buddemat · Accepted Answer · 2021-09-20T16:52:00.460

At first, you can use regexp_split_into_table and a regex with a positive lookahead to get a version of your table in which each of the rows contains exactly one update:

select companyID, 
       updated_at, 
       regexp_split_to_table(updates, '\n(?=\y.+:)') as updates 
  from old;

This will split the column updates at any newline (\n) that is followed by a single word and a colon (\y.+:).

#=============#==============#================#
| companyID   |  updated_at  |   updates      |
#=============#==============#================#
| 101         | 2020-11-01   | name:          |
|             |              | -ABC           |
|             |              | -XYZ           |
+-------------+--------------+----------------+
| 101         | 2020-11-01   | url:           |
|             |              | -www.abc.com   |
|             |              | -www.xyz.com   |
+-------------+--------------+----------------+
| 109         | 2020-10-20   | rating:        |
|             |              | -4.5           |
|             |              | -4.0           |
+-------------+--------------+----------------+

From this, you can more easily build your desired table. To do this, you can use e.g. split_part to split the update string into the three parts you want.

Putting this together with the first part gets you the full query:

select companyID, 
       updated_at, 
       split_part(updates, E':', 1) as field, 
       split_part(updates, E'\n-', 2) as old_value, 
       split_part(updates, E'\n-', 3) as new_value  
  from (select companyID, 
               updated_at, 
               regexp_split_to_table(updates, '\n(?=\y.+:)') as updates 
          from old
       )
;

Here is a db<>fiddle example.

More details / additional info:

Clean string in postgresql

1 Answers1