2

note: this is a question about errors encountered while trying to execute the suggestion in import-xml-files-to-postgresql

I am attempting to import a one-row XML file in order to test the code needed to import all of my rows, which should be over 600,000. My XML looks like this:

<response>
<row>
<row _id="1" _uuid="7A68A6C8-3E73-4976-A4BD-9995F97A580F" _position="1" _address="https://data.kcmo.org/resource/vrys-qgrz/1">
<objectid>471537</objectid>
<parcelid>2960</parcelid>
<kivapin>100064</kivapin>
<subdivision></subdivision>
<landusecode>1111 - Single Family (Non-Mobile Home Park)</landusecode>
<apn>CL1330600060270001</apn>
<parceltype>Parcels</parceltype>
<status>2 - Existing</status>
<condo>No</condo>
<prefix>N</prefix>
<own_name>Smith John</own_name>
<own_addr>123 Main Street</own_addr>
<own_city>Kansas City</own_city>
<own_zip>64114-1234</own_zip>
<shape_length>410.3620269</shape_length>
<shape_area>9314.662882</shape_area>
<latitude>39.2636</latitude>
<longitude>-94.5698</longitude>
<location_1 human_address="{&quot;address&quot;:&quot;123 Main Street&quot;,&quot;city&quot;:&quot;Kansas City&quot;,&quot;state&quot;:&quot;MO&quot;,&quot;zip&quot;:&quot;64114-1234&quot;}" latitude="39.2636" longitude="-94.5698" needs_recoding="false"/>
</row>
</row>
</response>

My code to insert this into a table on my database is as follows:

SELECT
  (xpath('//objectid/text()', myTempTable.myXmlColumn))[1]::text AS objectid,
  (xpath('//parcelid/text()', myTempTable.myXmlColumn))[1]::text AS parcelid,
  (xpath('//kivapin/text()', myTempTable.myXmlColumn))[1]::text AS kivapin,
  (xpath('//subdivision/text()', myTempTable.myXmlColumn))[1]::text AS subdivision,
  (xpath('//block/text()', myTempTable.myXmlColumn))[1]::text AS block,
  (xpath('//lot/text()', myTempTable.myXmlColumn))[1]::text AS lot,
  (xpath('//datecreated/text()', myTempTable.myXmlColumn))[1]::text AS datecreated,
  (xpath('//landusecode/text()', myTempTable.myXmlColumn))[1]::text AS landusecode,
  (xpath('//apn/text()', myTempTable.myXmlColumn))[1]::text AS apn,
  (xpath('//parceltype/text()', myTempTable.myXmlColumn))[1]::text AS parceltype,
  (xpath('//status/text()', myTempTable.myXmlColumn))[1]::text AS status,
  (xpath('//condo/text()', myTempTable.myXmlColumn))[1]::text AS condo,
  (xpath('//platname/text()', myTempTable.myXmlColumn))[1]::text AS platname,
  (xpath('//fraction/text()', myTempTable.myXmlColumn))[1]::text AS fraction,
  (xpath('//prefix/text()', myTempTable.myXmlColumn))[1]::text AS prefix,
  (xpath('//suite/text()', myTempTable.myXmlColumn))[1]::text AS suite,
  (xpath('//own_name/text()', myTempTable.myXmlColumn))[1]::text AS own_name,
  (xpath('//own_addr/text()', myTempTable.myXmlColumn))[1]::text AS own_addr,
  (xpath('//own_city/text()', myTempTable.myXmlColumn))[1]::text AS own_city,
  (xpath('//own_zip/text()', myTempTable.myXmlColumn))[1]::text AS own_zip,
  (xpath('//blvdfront/text()', myTempTable.myXmlColumn))[1]::text AS blvdfront,
  (xpath('//lastupdate/text()', myTempTable.myXmlColumn))[1]::text AS lastupdate,
  (xpath('//shape_length/text()', myTempTable.myXmlColumn))[1]::text AS shape_length,
  (xpath('//shape_area/text()', myTempTable.myXmlColumn))[1]::text AS shape_area,
  (xpath('//latitude/text()', myTempTable.myXmlColumn))[1]::text AS latitude,
  (xpath('//longitude/text()', myTempTable.myXmlColumn))[1]::text AS longitude,
  (xpath('//location1/text()' myTempTable.myXmlColumn))[1]::text AS location1,
  myTempTable.myXmlColumn as myXmlElement
FROM unnest(
  '//row'
  ,XMLPARSE(DOCUMENT convert_from(pg_read_binary_file('parcel_data_first_row.xml'), 'UTF8'))
) AS myTempTable(myXmlColumn);

An attempt to execute this statement renders this error:

[2018-03-26 19:42:50] Using batch mode (1000 insert/update/delete statements max)
SELECT
(xpath('//objectid/text()', myTempTable.myXmlColumn))[1]::text AS objectid,
(xpath('//parcelid/text()', myTempTable.myXmlColumn))[1]::text AS parcelid,
(xpath('//kivapin/text()', myTempTable.myXmlColumn))[1]::text AS kivapin,
...
[2018-03-26 19:42:50] [42601] ERROR: syntax error at or near "myTempTable"
[2018-03-26 19:42:50] Position: 2058
[2018-03-26 19:42:50] Summary: 1 of 1 statements executed, 1 failed in 380ms (2293 symbols in file)

I thought that maybe this was an issue of some syntax error within the body of code, so I ran only the first xpath statement, but that gives an error:

[2018-03-26 19:46:17] Using batch mode (1000 insert/update/delete statements max)
SELECT
(xpath('//objectid/text()', myTempTable.myXmlColumn))[1]::text AS objectid,
myTempTable.myXmlColumn as myXmlElement
FROM unnest(
'//row'
,XMLPARSE(DOCUMENT convert_from(pg_read_binary_file('parcel_data_first_row.xml'), 'UTF8...
[2018-03-26 19:46:17] [42804] ERROR: could not determine polymorphic type because input has type "unknown"
[2018-03-26 19:46:17] Summary: 1 of 1 statements executed, 1 failed in 385ms (273 symbols in file)

I'm not quite sure where to go from here.

Jim Jones
  • 18,404
  • 3
  • 35
  • 44
nerdenator
  • 1,265
  • 2
  • 18
  • 35
  • it is odd. in 9.5 unknown should be casted to text on its own, but it seems to not happen - try explicit cast – Vao Tsun Mar 27 '18 at 07:17

2 Answers2

1

Once you have the XML document already in your table, you can parse it using something like this:

 WITH j AS (SELECT UNNEST(XPATH('//row',myXmlColumn)) AS myXmlColumn
 FROM myTempTable)
 SELECT
      (xpath('//objectid/text()', j.myXmlColumn))[1]::text AS objectid,
      (xpath('//parcelid/text()', j.myXmlColumn))[1]::text AS parcelid,
      (xpath('//kivapin/text()', j.myXmlColumn))[1]::text AS kivapin,
      (xpath('//subdivision/text()', j.myXmlColumn))[1]::text AS subdivision,
      (xpath('//block/text()', j.myXmlColumn))[1]::text AS block,
      (xpath('//lot/text()', j.myXmlColumn))[1]::text AS lot,
      (xpath('//datecreated/text()', j.myXmlColumn))[1]::text AS datecreated,
      (xpath('//landusecode/text()', j.myXmlColumn))[1]::text AS landusecode,
      (xpath('//apn/text()', j.myXmlColumn))[1]::text AS apn,
      (xpath('//parceltype/text()', j.myXmlColumn))[1]::text AS parceltype,
      (xpath('//status/text()', j.myXmlColumn))[1]::text AS status,
      (xpath('//condo/text()', j.myXmlColumn))[1]::text AS condo,
      (xpath('//platname/text()', j.myXmlColumn))[1]::text AS platname,
      (xpath('//fraction/text()', j.myXmlColumn))[1]::text AS fraction,
      (xpath('//prefix/text()', j.myXmlColumn))[1]::text AS prefix,
      (xpath('//suite/text()', j.myXmlColumn))[1]::text AS suite,
      (xpath('//own_name/text()', j.myXmlColumn))[1]::text AS own_name,
      (xpath('//own_addr/text()', j.myXmlColumn))[1]::text AS own_addr,
      (xpath('//own_city/text()', j.myXmlColumn))[1]::text AS own_city,
      (xpath('//own_zip/text()', j.myXmlColumn))[1]::text AS own_zip,
      (xpath('//blvdfront/text()', j.myXmlColumn))[1]::text AS blvdfront,
      (xpath('//lastupdate/text()', j.myXmlColumn))[1]::text AS lastupdate,
      (xpath('//shape_length/text()', j.myXmlColumn))[1]::text AS shape_length,
      (xpath('//shape_area/text()', j.myXmlColumn))[1]::text AS shape_area,
      (xpath('//latitude/text()', j.myXmlColumn))[1]::text AS latitude,
      (xpath('//longitude/text()', j.myXmlColumn))[1]::text AS longitude,
      (xpath('//location1/text()', j.myXmlColumn))[1]::text AS location1,
      j.myXmlColumn as myXmlElement
    FROM j

CTEs are not always my first choice when dealing with huge amount of data, but it certainly makes the code much more readable and it's worth taking into account when dealing with data imports.

Regarding importing XML files into PostgreSQL, I always use COPY for the matter and use an intermediate table to store the XML document before unnesting it.

Something like described here:

$ psql db -c "CREATE TABLE tmp (doc XML);"
$ cat xmlfile.xml | psql db -c "COPY tmp FROM STDIN"

If PostgreSQL complains about your data having line feeds \n, you can sort of escape them using tools like sed, tr or even with perl -pe:

$ cat xmlfile.xml | perl -pe 's/\n/\\n/g' | psql db -c "COPY tmp FROM STDIN"

By the way: you were missing one comma , in your query right after this xpath expression: (xpath('//location1/text()' myTempTable.myXmlColumn))[1]::text AS location1,

EDIT: If you have the luxury of placing files directly into the database server's file system (most of us don't), you can keep using the combination of pg_read_binary_file and convert_from via UNNEST but keep in mind that the expression //row/ results in an unknown type, which can be tricky for using as parameters in functions. Instead, use a simple XPATH expression to do the job:

SELECT
...
FROM UNNEST(XPATH(
  '//row'
  ,XMLPARSE(DOCUMENT convert_from(pg_read_binary_file('parcel_data_first_row.xml'), 'UTF8')))
) AS myTempTable(myXmlColumn);
Jim Jones
  • 18,404
  • 3
  • 35
  • 44
  • Now, that point about the comma is interesting. When I add it back in, I get the following error: `[2018-03-27 19:29:10] [42804] ERROR: could not determine polymorphic type because input has type "unknown"`. – nerdenator Mar 28 '18 at 00:31
  • I'll edit my answer with a solution using your approach to import files – Jim Jones Mar 28 '18 at 09:59
0
t=# select pg_get_function_arguments(oid),oid::regprocedure from pg_proc where proname = 'pg_read_binary_file';
   pg_get_function_arguments   |                       oid
-------------------------------+-------------------------------------------------
 text, bigint, bigint          | pg_read_binary_file(text,bigint,bigint)
 text, bigint, bigint, boolean | pg_read_binary_file(text,bigint,bigint,boolean)
 text                          | pg_read_binary_file(text)
(3 rows)

try casting pg_read_binary_file('parcel_data_first_row.xml' as:

pg_read_binary_file('parcel_data_first_row.xml'::text)
Vao Tsun
  • 47,234
  • 13
  • 100
  • 132