Unnest multiple array columns stored as text in lockstep

Question

I have the following table in Postgres 9.6:

CREATE TABLE some_tbl(
  target_id integer NOT NULL
, machine_id integer NOT NULL
, dateread timestamp without time zone NOT NULL
, state text
, ftime text
, CONSTRAINT pk_sometable PRIMARY KEY (target_id, machine_id, dateread)
 );

With data like:

targetID	MachineID	DateRead	State	FTime
60000	30	'2021-09-29 15:20:00'	'0\|1\|0'	'850\|930\|32000'
60000	31	'2021-09-29 16:35:13'	'0\|0\|0'	'980\|1050\|30000'

The important part is state and ftime. I need to unnest elements and keep their order. This generates steps.

For example, the first row will be:

targetID	MachineID	DateRead	State	FTime	Step
60000	30	'2021-09-29 15:20:00'	'0'	'850'	0
60000	30	'2021-09-29 15:20:00'	'1'	'930'	1
60000	30	'2021-09-29 15:20:00'	'0'	'32000'	2

The ORDER is important, because FTIME 850 ms is always FIRST and gets value 0 in STEP, then later 930 ms is the second and get step 1 and at last 32000 ms is the third and get step 2.

Currently, I resolve this by first using string_to_array() to transform the text to array, then unnnest()and finally use row_number() to assign the step number.

This work fantastically - except sometimes some index appears out of order. Like this for the first row:

targetID	MachineID	DateRead	State	Ftime	Step
60000	30	'2021-09-29 15:20:00'	'1'	'930'	0
60000	30	'2021-09-29 15:20:00'	'0'	'32000'	1
60000	30	'2021-09-29 15:20:00'	'0'	'850'	2

I do it over a thousands of records and practically all are ok, but later I must do statistics and need get min, max, average and get wrong values, so I check and i see are the index wrong (I move statistics with a massive ETL process) but if I execute the select checking the specific row with error it show perfect. So I assume than row_number have problems with index some times and this is very random.

This is the SQL that i use:

SELECT foo.target_id,
            dateread,
            foo.machine_id,
            foo.state,
            foo.ftime::integer,
            (row_number() OVER (PARTITION BY foo.dateread, foo.machine_id, foo.target_id)) - 1 AS step
           FROM ( SELECT target_id,
                machine_id,
                dateread
                unnest(string_to_array(state, '|'::text))::integer AS state,
                unnest(string_to_array(ftime, '|'::text))::integer AS tiempo
               FROM some_table
               WHERE target_id IN (6000) AND dateread = '2021-06-09')foo

Is there some better way to do this?

Your precise table definition (`CREATE TABLE` statement showing data types and constraints) and your version of Postgres, please. — Erwin Brandstetter, Sep 29 '21 at 22:12
Hi, this is the BASE TABLE: 'CREATE TABLE schema.sometable ( target_id integer NOT NULL, machine_id integer NOT NULL, dateread timestamp without time zone NOT NULL, state text, ftime text, CONSTRAINT pk_sometable PRIMARY KEY (target_id, machine_id, dateread) ) WITH ( OIDS=FALSE ); ALTER TABLE schema.sometable OWNER TO postgres;' AND I am using Postgres 9.6. Thanks @ErwinBrandstetter — Alvaro, Sep 29 '21 at 22:23
[Edit] the table definition into the question, not into a comment, please. — Erwin Brandstetter, Sep 29 '21 at 22:27

Erwin Brandstetter · Accepted Answer · 2021-09-29T23:12:43.273

One elegant way is to use the special implementation of unnest() for multiple input arrays in a LATERAL subquery and attach WITH ORDINALITY:

SELECT t.target_id, t.dateread, t.machine_id, u.state, u.tiempo
     , ord - 1 AS step
FROM   tbl t
LEFT   JOIN LATERAL unnest(string_to_array(state, '|')::int[]
                         , string_to_array(ftime, '|')::int[]) WITH ORDINALITY AS u(state, tiempo, ord) ON true
WHERE  target_id = 60000
AND    dateread = '2021-09-29 15:20:00'   -- adapted
ORDER  BY t.target_id, t.dateread, t.machine_id, step;

db<>fiddle here

Since state and ftime can be NULL, I use LEFT JOIN ... ON true to keep such rows in the result.

See:

Of course, what you really should do is this:

Unfriend the guy who designed the database. (PC version of my real advice.)
Install a current Postgres version. See: https://www.postgresql.org/support/versioning/
Create a new database with a proper relational design.
Migrate your data. (And keep a backup of your original to be safe.)
Burn the old DB and never speak of it again.

A proper (normalized) relational design in modern Postgres could look like this:

CREATE TABLE tbl (
  tbl_id int GENERATED ALWAYS AS IDENTITY PRIMARY KEY
, target_id integer NOT NULL
, machine_id integer NOT NULL
, read_timestamp timestamp with time zone NOT NULL
, CONSTRAINT tbl_uni UNIQUE (target_id, machine_id, read_timestamp)
);

CREATE TABLE tbl_step (
  tbl_id int REFERENCES tbl ON DELETE CASCADE
, step int NOT NULL
, state int NOT NULL
, tiempo int NOT NULL
, CONSTRAINT tbl_step_pkey PRIMARY KEY (tbl_id, step)
);

Then your query would just be:

SELECT *
FROM   tbl 
LEFT   JOIN tbl_step USING (tbl_id);

Thanks a lot Erwin, i was checking your sql and works great, even compare massive load before and after and find over 30k differences. I've been checking the differences and so far and all are perfect. Really thanks a lot master. Have a nice weekend. — Alvaro, Oct 02 '21 at 00:35

Unnest multiple array columns stored as text in lockstep

1 Answers1