What would be the best way to insert XML data (that I get from a webpage) into PostgreSQL database?
I'm using Java and need a little help finding a good way to read this data into the database.

- 605,456
- 145
- 1,078
- 1,228
-
4when you say xml data to database, you mean you want to insert complete xml (as String/Text) or parse the fields and insert each field? – kdabir Sep 20 '11 at 20:42
-
1Just adding the xml file to database, or parsing values from it and saving into specific tables? – EO2 Sep 20 '11 at 20:43
-
this is the data http://www.sedlabanki.is/?PageID=289 And yes, i have one table that this data should go into, that table have same column as each
have. – Sep 20 '11 at 20:47
3 Answers
I have a working implementation where I do everything inside PostgreSQL without additional libraries.
Auxiliary parsing function
CREATE OR REPLACE FUNCTION f_xml_extract_val(text, xml)
RETURNS text AS
$func$
SELECT CASE
WHEN $1 ~ '@[[:alnum:]_]+$' THEN
(xpath($1, $2))[1]
WHEN $1 ~* '/text()$' THEN
(xpath($1, $2))[1]
WHEN $1 LIKE '%/' THEN
(xpath($1 || 'text()', $2))[1]
ELSE
(xpath($1 || '/text()', $2))[1]
END;
$func$ LANGUAGE sql IMMUTABLE;
Handle multiple values
The above implementation doesn't handle multiple attributes at one xpath. Here is an overloaded version of f_xml_extract_val()
for that. With the 3rd parameter you can pick one
(the first), all
or dist
(distinct) values. Multiple values are aggregated to a comma-separated string.
CREATE OR REPLACE FUNCTION f_xml_extract_val(_path text, _node xml, _mode text)
RETURNS text AS
$func$
DECLARE
_xpath text := CASE
WHEN $1 ~~ '%/' THEN $1 || 'text()'
WHEN lower($1) ~~ '%/text()' THEN $1
WHEN $1 ~ '@\w+$' THEN $1
ELSE $1 || '/text()'
END;
BEGIN
-- fetch one, all or distinct values
CASE $3
WHEN 'one' THEN RETURN (xpath(_xpath, $2))[1]::text;
WHEN 'all' THEN RETURN array_to_string(xpath(_xpath, $2), ', ');
WHEN 'dist' THEN RETURN array_to_string(ARRAY(
SELECT DISTINCT unnest(xpath(_xpath, $2))::text ORDER BY 1), ', ');
ELSE RAISE EXCEPTION
'Invalid $3: >>%<<', $3;
END CASE;
END
$func$ LANGUAGE plpgsql;
COMMENT ON FUNCTION f_xml_extract_val(text, xml, text) IS '
Extract element of an xpath from XML document
Overloaded function to f_xml_extract_val(..)
$3 .. mode is one of: one | all | dist'
Call:
SELECT f_xml_extract_val('//city', x, 'dist');
Main part
Name of target table: tbl
; prim. key: id
:
CREATE OR REPLACE FUNCTION f_sync_from_xml()
RETURNS boolean AS
$func$
DECLARE
datafile text := 'path/to/my_file.xml'; -- only relative path in db dir
myxml xml := pg_read_file(datafile, 0, 100000000); -- arbitrary 100 MB
BEGIN
-- demonstrating 4 variants of how to fetch values for educational purposes
CREATE TEMP TABLE tmp ON COMMIT DROP AS
SELECT (xpath('//some_id/text()', x))[1]::text AS id -- id is unique
, f_xml_extract_val('//col1', x) AS col1 -- one value
, f_xml_extract_val('//col2/', x, 'all') AS col2 -- all values incl. dupes
, f_xml_extract_val('//col3/', x, 'dist') AS col3 -- distinct values
FROM unnest(xpath('/xml/path/to/datum', myxml)) x;
-- 1.) DELETE?
-- 2.) UPDATE
UPDATE tbl t
SET ( col_1, col2, col3) =
(i.col_1, i.col2, i.col3)
FROM tmp i
WHERE t.id = i.id
AND (t.col_1, t.col2, t.col3) IS DISTINCT FROM
(i.col_1, i.col2, i.col3);
-- 3.) INSERT NEW
INSERT INTO tbl
SELECT i.*
FROM tmp i
WHERE NOT EXISTS (SELECT 1 FROM tbl WHERE id = i.id);
END
$func$ LANGUAGE plpgsql;
Important notes
This implementation checks on a primary key if the inserted row exists already and updates in this case. Only new rows are inserted.
I use a temporary staging table to speed up the procedure.
Tested with Postgres 8.4, 9.0 and 9.1.
XML has to be well-formed.
pg_read_file()
has restrictions to it. The manual:Use of these functions is restricted to superusers.
And:
Only files within the database cluster directory and the
log_directory
can be accessed.
So you have to put your source file there - or create a symbolic link to your actual file/directory.
Or you can provide the file via Java in your case (I did it all inside Postgres).
Or you can import the data into 1 column of 1 row of a temporary table and take it from there.
Or you can use lo_import
like demonstrated in this related answer on dba.SE.
This blog post by Scott Bailey helped me.

- 605,456
- 145
- 1,078
- 1,228
-
what i must do if the xml rows contain reserved character such as "'" or "&"?? – Mohammed Falha Dec 22 '14 at 19:34
Postgres has (thanks to Daniel Lyons for pointing it out) native XML support which you can use to store your table. If you however want to shred your XML data manually, there are different possibilities to represent XML data in a database. The first question should be, if you want a very generic solution, that will be able to store any XML document or one that is specific to your domain (i.e. only allows XML documents of a certain structure). Depending on that, you will have a very flexible, universal representation which is however harder to query (the SQL needed will be quite complicated). If you have a more specific approach, the queries will be simpler, but you will need to create new tables or add new attributes to existing talbes every time you want to store another type of document or add a field to an existing document; so changing the schema will be harder (which is one major advantage of XML). This presentation should give you some ideas what are the different possibilities.
Also, you might consider to switch to some DB that supports Xquery, like DB2. The ability to natively query using XQuery, a language targeted at processing XML, will simplify things a lot.
UPDATE: Given your comment, your XML data (that you linked to) is perfectly relational. It can be mapped 1:1 to the following table:
CREATE TABLE mynt (
ID SERIAL ,
myntnafn CHAR(3) ,
myntheiti Varchar(255) ,
kaupgengi Decimal(15,2) ,
midgengi Decimal(15,2) ,
solugengi Decimal(15,2) ,
dagsetning TimeStamp
)
So any mynt
tag would be a record in the table and the corresponding sub-tags the attributes. The data types I gathered from your data, they might be wrong. The main problem is, IMO, that there is no natural primary key, so I added an autogenerated one.

- 20,544
- 2
- 29
- 55
-
4[PostgreSQL has had native XML support](http://www.postgresql.org/docs/current/static/datatype-xml.html) [since version 8.2](http://www.postgresql.org/docs/8.2/static/datatype-xml.html). – Daniel Lyons Sep 20 '11 at 20:59
-
-
@Daniel: I updated it now to reflect the fact that PostgreSQL has XML. However, DB2 still has the advantage of having XQuery, from what I gathered. – Janick Bernet Sep 20 '11 at 21:04
-
Im going to use postgres but the xml data i linked in my comment above is the only kind of data that i will use. All XML that i will need to insert in my table will be like this xml so i will never net to add new attributes or anything like that. I going to google little bit more and try to find something, but if you have any idea what i can use to do this please share :). (sorry if my English is poor, not my strongest side). – Sep 20 '11 at 21:20
-
-
-
1@MaxvonHippel they appear to be working today, but regardless, I can't edit links in a comment from four years ago, so it would be better for you to actually just leave corrected links. – Daniel Lyons Oct 24 '16 at 14:00
-
@DanielLyons you're right, sorry. Combined fault of Nepali internet and me Stack-Overflowing after about 48 straight hours of coding. – Max von Hippel Oct 24 '16 at 15:15
PostgreSQL has an XML datatype. There are lots of XML specific functions you can use to query and modify the data, such as with xpath.
From the Java side, you can pretend you're just working with strings, but know that the data is well-formed on the way out and it won't let you store non-well-formed data.

- 22,421
- 2
- 50
- 77