0

Following this example, I am having trouble using a PostgreSQL variables in a *.sql script:

  • I want to iterate over a number of XML data files, using a BASH script

  • the BASH script assigns XML file names to a variable, that is passed to SQL script

  • the SQL script, called by that BASH script, loads those data into PostgreSQL

If I source the XML files directly, there is no problem; however, I cannot access that variable, in my SQL script:

In my SQL script (hmdb.sql) I can access the PSQL variable :bash_var (passed from the BASH script):

\echo '\nEXTERNAL VARIABLE (= "datafile", HERE):' :bash_var '\n'

and/or directly reference the XML file,

datafile text := 'hmdb/hmdb.xml';

but not as a variable:

datafile text := 'bash_var';

hmdb.sh

#!/bin/bash

DATA_DIR=data/

for file in $DATA_DIR/*.xml
  do
    bash_var=$(echo ${file##*/})
    echo $bash_var
    psql -d hmdb -v bash_var=$bash_var -f hmdb.sql
done
Victoria Stuart
  • 4,610
  • 2
  • 44
  • 37
  • 1
    I deal daily with tons of XML files and I always found that approach easy to implement and to maintain: https://stackoverflow.com/questions/49862106/importing-an-xml-schema-into-postgres-to-automatically-create-a-table-and-then-p/49863444#49863444 doing so you simplify a lot your data import. – Jim Jones Apr 27 '18 at 07:29
  • @JimJones: thank you; funny timing, eh? Mentioned in your answer (other [SO question](https://stackoverflow.com/questions/49862106/importing-an-xml-schema-into-postgres-to-automatically-create-a-table-and-then-p/49863444#49863444)), I am also using `xml_split` as my file is huge (3.4 GB), and I was getting OOM issues. I'll post back here once I work through this. :-) – Victoria Stuart Apr 27 '18 at 19:59

1 Answers1

1

OK, here is my solution.

I post a more detailed answer on my Persagen.com blog.

Basically, I decided to abrogate the DO $$DECLARE ... approach (described in SO 49950384) in favor of the simplified approach, below.

I am then able to access the BASH / PSQL shared variable, :bash_var, thusly:

xpath('//metabolite', XMLPARSE(DOCUMENT convert_from(pg_read_binary_file(:'bash_var'))))

Here is a sample SQL script, illustrating that usage:

hmdb.sql

\c hmdb

CREATE TABLE hmdb_identifiers (
  id SERIAL,
  accession VARCHAR(15) NOT NULL,
  name VARCHAR(300) NOT NULL,
  cas_number VARCHAR(12),
  pubchem_cid INT,
  PRIMARY KEY (id),
  UNIQUE (accession)
);

\echo '\n[hmdb.sql] bash_var:' :bash_var '\n'

-- UPDATE (2019-05-15): SEE MY COMMENTS BELOW RE: TEMP TABLE!
CREATE TEMP TABLE tmp_table AS 
SELECT 
  (xpath('//accession/text()', x))[1]::text::varchar(15) AS accession
  ,(xpath('//name/text()', x))[1]::text::varchar(300) AS name 
  ,(xpath('//cas_registry_number/text()', x))[1]::text::varchar(12) AS cas_number 
  ,(xpath('//pubchem_compound_id/text()', x))[1]::text::int AS pubchem_cid 
-- FROM unnest(xpath('//metabolite', XMLPARSE(DOCUMENT convert_from(pg_read_binary_file('hmdb/hmdb.xml'), 'UTF8')))) x
FROM unnest(xpath('//metabolite', XMLPARSE(DOCUMENT convert_from(pg_read_binary_file(:'bash_var'), 'UTF8')))) x
;

INSERT INTO hmdb_identifiers (accession, name, cas_number, pubchem_cid)
  SELECT lower(accession), lower(name), lower(cas_number), pubchem_cid FROM tmp_table;

DROP TABLE tmp_table;

SQL script notes:

  • In the xpath statements I recast the ::text (e.g.: ::text::varchar(15)) per the Postgres table schema.

  • More significantly, if I did not recast the datatypes in the xpath statement and a field entry (e.g. name length) exceeded the SQL varchar(300) length limit, those data threw a PSQL error and the table did not update (i.e. a blank table results).

I uploaded the XML data files used in this answer at this Gist

https://gist.github.com/victoriastuart/d1b1959bd31e4de5ed951ff4fe3c3184

Direct links:


UPDATE (2019-05-15)

In follow-on work, detailed in my research blog post Exporting Plain Text to PostgreSQL, I directly load XML data into PostgreSQL, rather than using temp tables.

TL/DR. In that project, I observed the following improvements.

Parameter | Temp Tables  | Direct Import | Reduction
    Time: | 1048 min     | 1.75 min      | 599x
   Space: | 252,000 MB   | 18 MB         | 14,000x
Victoria Stuart
  • 4,610
  • 2
  • 44
  • 37