Split string into multiple columns

Question

I have a table:

id	date
5356:type=sub&quality=tier3&country=de	2022-07-01
351:country=fr&type=follow	2022-07-01

I want to split these ids (to create a view) like:

before : - this is actual id;
before = - this is a column name for parameters;

So I expect this:

id	date	type	quality	country
5356	2022-07-01	sub	tier3	de
351	2022-07-01	follow	NULL	fr

I can't use split_part for this, because of different order of parameters inside id.

What happens as other rows have different columns? Does this table/result-set just grow horizontally to accommodate every possible value that might appear as a column? — JNevill, Aug 09 '22 at 16:19
Why is such a format used in the first place, instead of a proper table structure? Even if the parameters are dynamic, you should have an actual primary key column and eg a JSON field that can be parsed using JSON functions. You *can't* get a `null` country though unless you already specify somehow that this is an expected column. At which point we go back to `why not use a proper table`? — Panagiotis Kanavos, Aug 09 '22 at 16:33
id can contains only 5 parameters: type, quality, country, active, age, but the order is not guaranteed. table `id` only grows by horizontally. expected view same, it has only id, date and 5 parameters (in example only 3) - so it's possible to harcode column names and use select with where clauses, i guess? — fujidaon, Aug 09 '22 at 16:36
@PanagiotisKanavos these strings come from API, so i store it in original form — fujidaon, Aug 09 '22 at 16:41
Parse it instead and store it in a usable form. SQL, the language, isn't great at text manipulation. It's a *Q*uery language. Databases are fast because they can use indexes over their data and optimize queries. When you have to perform complex parsing to extract data, any index on the field value is useless and can't be used. This means that every time you try to load item 1536 you'll have to parse all 1 Million table rows to find the single answer. Every time — Panagiotis Kanavos, Aug 09 '22 at 16:45
The cheapest and fastest option would be to use proper columns. If that's not possible (why?), at the very least use a primary key and use JSON columns. PostgreSQL can index JSON data *BUT* that will still be slower than indexes over simple columns. JSON isn't an excuse for bad table design but it does allow storing and querying "bags" of dynamic properties — Panagiotis Kanavos, Aug 09 '22 at 16:49
As for `these strings come from API` , HTTP APIs typically return JSON strings, not what looks like the query parameter section of a URL. Unless the API response is a `multipart/form-data` response? This can be parsed by any client language one way or another. — Panagiotis Kanavos, Aug 09 '22 at 16:52

klin · Accepted Answer · 2022-08-11T22:27:33.540

Transforming these strings into jsonb objects is relatively straightforward:

select 
    split_part(id, ':', 1) as id,
    date,
    jsonb_object_agg(split_part(param, '=', 1), split_part(param, '=', 2)) as params
from my_table
cross join unnest(string_to_array(split_part(id, ':', 2), '&')) as param
group by id, date;

Now you can use the solution described in Flatten aggregated key/value pairs from a JSONB field?

Alternatively, if you know the number and names of the parameters, this query is simpler and works well:

select
    id,
    date,
    params->>'type' as type,
    params->>'country' as country,
    params->>'quality' as quality
from (
    select 
        split_part(id, ':', 1) as id,
        date,
        jsonb_object_agg(split_part(param, '=', 1), split_part(param, '=', 2)) as params
    from my_table
    cross join unnest(string_to_array(split_part(id, ':', 2), '&')) as param
    group by id, date
    ) s;

Test it in Db<>fiddle.

In Postgres 14+ you can replace unnest(string_to_array(...)) with string_to_table(...).

Split string into multiple columns

1 Answers1