0

I have directories with xml files. Each XML file looks similar to this:

<songs>
<song>
    <title>Some Title</title>
    <artist>Artist</artist>
</song>
<song>
    <title>Other Title</title>
    <artist>Artist 1</artist>
</song>
<song>
    <title>Some Title</title>
    <artist>Artist/artist>
</song>
</songs>

My Postgresql table have three columns (title, artist, date of insert). I'm using RecursiveDirectoryIterator to iterate through all xml files.

QUESTIONS:

  1. What will be the best approach to accomplish bulk Upsert (the upsert will be insert if not exists and update the date of insert if exist. The distinct is artist and title pair)?

  2. Is there any way to pass the XML directly to the Postgresql by getting it with:

    simplexml_load_file
    

    Or i've to get it as string with:

    file_get_contents
    

    And then parse it back to XML within the Postgresql.

  3. What will be the preferred way to accomplish this task. Generate a huge XML in the php and then pass it to Postgresql, or send it file by file(file chunks method)?

I've tried Inserting by Rows from the XML but this will be extremely inefficient when having lets say 100XML files with 100 songs each. So I'm trying to wrap my head about the way to Bulk Upsert on File level, but I'm stuck..

ANY help will be highly appreciated...

EDIT:1 I want to accomplish this:

INSERT INTO Songs(Artist, Title, DateOfInsertion) 
SELECT x.Artist,
      ,x.Title
FROM xpath('//song', xml('<songs><song><title>Some Title</title><artist>Artist</artist></song><song><title>Other Title</title><artist>Artist 1</artist></song><song><title>Some Title</title><artist>Artist/artist></song></songs>'))) x;
EXCEPTION WHEN unique_violation THEN
UPDATE Songs
SET DateAdded = (now() at time zone 'utc')
WHERE (?????)

EDIT:2 This is what i came up with:

CREATE OR REPLACE FUNCTION insertSongs(xml TEXT) RETURNS VOID AS
$$
BEGIN
LOOP
    -- first try to update the key
    UPDATE Songs 
    SET DateAdded = (now() at time zone 'utc')
    FROM unnest(xpath('//song', xml(xml))) x
    WHERE Artist = extract_value('//artist', x)
        AND Title = extract_value('//title', x);
    IF found THEN
        RETURN;
    END IF;
    -- not there, so try to insert the key
    -- if someone else inserts the same key concurrently,
    -- we could get a unique-key failure
    BEGIN
        INSERT INTO Songs(Artist, Title) 
        SELECT extract_value('//artist', x) 
            ,extract_value('//title', x)
        FROM unnest(xpath('//song', xml(xml))) x;
        RETURN;
    EXCEPTION WHEN unique_violation THEN
        -- Do nothing, and loop to try the UPDATE again.
    END;
END LOOP;
END;
$$
LANGUAGE plpgsql;

select insertSongs('<songs><song><artist>A</artist><title>A</title></song><song><artist>B</artist><title>B</title></song></songs>')

And The Extract_Value function looks like this:

CREATE OR REPLACE FUNCTION extract_value(
   VARCHAR,
   XML
) RETURNS TEXT AS
$$
   SELECT CASE WHEN $1 ~ '@[[:alnum:]_]+$'
   THEN (xpath($1, $2))[1]
   WHEN $1 ~* '/text()$'
   THEN (xpath($1, $2))[1]
   WHEN $1 LIKE '%/'
   THEN (xpath($1 || 'text()', $2))[1]
   ELSE (xpath($1 || '/text()', $2))[1]
   END::text;

$$ LANGUAGE 'sql' IMMUTABLE;

But now only one of the statements is executing.(e.g) If i have Record for Song with Artist A and Title A already in the DB but not having Artist B and Title B the procedure updates only A A Date of insertion and skips the insert for B B. If i don't have both, it inserts them.

sla55er
  • 791
  • 1
  • 8
  • 16
  • And your question finally is? And not to bark the wrong tree: I'm pretty sure postgres has good XML import from the commandline, so perhaps all you would need to do in PHP is to create on XML file from all the subdirectories... . – hakre Sep 14 '14 at 18:56
  • From the Question: QUESTIONS: What will be the best approach to accomplish bulk Upsert (the upsert will be insert if not exists and update the date of insert if exist. The distinct is artist and title pair)? Is there any way to pass the XML directly to the Postgresql by getting it with: simplexml_load_file Or i've to get it as string with: file_get_contents And then parse it back to XML within the Postgresql. – sla55er Sep 15 '14 at 05:33
  • Thanks for the clarification. If you have further question(s) please reference duplicate material and why it didn't work for you. Thanks! Also there is a Q&A site dedicated fully to database administration over here: http://dba.stackexchange.com/ – hakre Sep 15 '14 at 08:47
  • I don't think that any of the related answers to the Duplicated link question is relevant to any of my questions. – sla55er Sep 15 '14 at 10:39
  • According to your comment you were looking for a method to import XML into postgresql. That was only one question covering it, for sure there are many more duplicate questions. As written, if you think that one is unfitting, please reference the question that you think is unfitting and explain why. That should make more clear what your question makes it different from that one and not stopping at one - of the many other ones that cover XML import into postgresql. Another approach is to only ask about that single feature you're looking for in postgresql - again search first. – hakre Sep 15 '14 at 11:40
  • Another hint: Questions consisting of many different questions do not work well on a Q&A site like Stackoverflow. Aways try to ask one specific question at a time. Additionally, questions like "What will be the preferred way to accomplish this task. [this] or [that] (both working)" can not really be answered for the scope of a wider audience unless the preconditions are clearly outlined (which most often renders such questions automatically answered). More help is in the help center: http://stackoverflow.com/help – hakre Sep 15 '14 at 11:43

0 Answers0