Overview :
1) I have a .TSV [tab-separated values, flat file] dump that I need to shunt into an existing Postgres table [table not of my design, but I do have direct db access] I intend to use psql \copy to accomplish this;
2) The table in question has one column defined as a one-dimensional array [historical reasons over which I have no control; i know that denormalized columns are far from optimal in most cases]
3) In the TSV, the fifth column contains the values that are destined for the Postgres array column; these values are separated by commas
4) In order to get these values into the postgres array column, I believe I need to wrap them with the following pattern: ‘{}’
-- such that values : foo,bar
become ‘{foo,bar}’
My guess is that the best way to go about this is with a regex, but my skills in that area are currently very weak [working on it, have the owl book and the regex cookbook!]. I’ve read several related SO questions/answers, skimmed rexegg and regex101, but can’t find information here or elsewhere that describes a similar-enough situation that I can use the solutions / approaches therein.
There are a few particular aspects of this problem that I am stuck on:
a) Since the data in each column is of variable length [i.e., each value could be any length], I can’t figure out how to identify the fifth column [in order to act on it];
b) Similarly, I need to append ‘{ to the beginning of the column and }’ to the end of it, but since the data within the column is variable in content and length, I can’t figure out how to handle that -- e.g., I can’t tell the regex to look for any particular character or length to trigger the addition of the brackets/quotes in the correct position
Below is an example row from the .tsv file; the fifth column with the comma separating the two values is the one I’m trying to act on.
1234 e@mail.addy 43210 0123456789 foo_value,bar_value 107.00 0.00 timestamp_1 timestamp_2 54321 string_2 string_3 timestamp_3 98765 12345 US Hawaii string_4 string_5 string_6 string_7 string_8 false true false
when the regex [or whatever transformation works] is done with it, it should [I think] look like this:
1234 e@mail.addy 43210 0123456789 ’{foo_value,bar_value}’ 107.00 0.00 timestamp_1 timestamp_2 54321 string_2 string_3 timestamp_3 98765 12345 US Hawaii string_4 string_5 string_6 string_7 string_8 false true false
so that the Postgres array column will recognize the input as a valid array.
Here’s the definition of the Postgres table :
CREATE TABLE postgres_table (
col1 SERIAL PRIMARY KEY,
col2 TEXT,
col3 TEXT,
col4 TEXT,
col5 TEXT[], /*this is the array column*/
col6 NUMERIC(19,2) NOT NULL,
col7 NUMERIC(19,2),
col8 TIMESTAMP WITHOUT TIME ZONE,
col9 TIMESTAMP WITHOUT TIME ZONE,
col10 TEXT,
col11 TEXT,
col12 TEXT,
col13 TIMESTAMP WITHOUT TIME ZONE DEFAULT now(),
col14 TEXT,
col15 TEXT,
col16 TEXT,
col17 TEXT,
col18 TEXT,
col19 TEXT,
col20 TEXT,
col21 TEXT,
col22 TEXT,
col23 BOOLEAN,
col24 BOOLEAN,
col25 BOOLEAN
);
additional notes : the db is Postgres 11; all systems involved are RHEL-based; I know a bit of JavaScript and SQL but my attempts thus far have been in bash, I’ve successfully used \copy for many .csv-type dumps in the past [but never to a denormalized table], and I thought the below particular SO question would get me most of the way there, but I get the error “substitution failed”
when I try to adapt the solution to my situation:
How can i capture all data from a certain column?
To clarify : I’m eager to use whatever approach works, so if there is a better way to go about this than regex, I’m all ears! All guidance is much appreciated.
EDITS:
Much thanks to @James Brown and @jjanes -- I don't have enough 'reputation points' yet to upvote your answers, but will do for sure as soon as I can.
The awk solution from @James Brown below worked for me -- the processed file contained a trailing newline that I had to zap with the awk script from this unix.SE question: https://unix.stackexchange.com/questions/140727/how-can-i-delete-a-trailing-newline-in-bash
I'm stoked to have an impetus to learn some more awk; what a great tool.
UPDATE : I'm still working on getting @jjanes solution working for my own edification -- still running into permissions issues after the GRANT TEMP approach, but I suspect this is something to do with the way RDS [where the db is hosted] is handling remote psql requests -- will update again when I have it working // -- I tried @jjanes solution below, but ran into the following problemo at the \copy
step:
ERROR: permission denied for schema pg_temp_5
here’s the output of
\z pg_temp_5.*
:
Access privileges
Schema | Name | Type | Access privileges | Column access privileges
-----------+--------+-------+-------------------+--------------------------
pg_temp_5 | foobar | table | user1=arwdDxt/user1 |
I’ve tried :
GRANT USAGE ON SCHEMA pg_temp_5 TO user1;
GRANT ALL ON SCHEMA pg_temp_5 TO user1;
ALTER TABLE pg_temp_5.foobar OWNER TO user1;
as suggested in the following SF question, but no dice
https://serverfault.com/questions/488669/postgres-insert-error-permission-denied-for-schema-public
I can’t see any reason why this shouldn’t work, a real head-scratcher.