2

I need to extract three columns of data from an xml column in postgres, such that the xml is expanded into it's appropriate columns. One of these columns needs to be an attribute of one nesting level of the xml, and the other columns are attributes of the nest one level down. The data from the higher level should be repeated. Is this possible? See the example below for something concrete.

Thanks, --sw

Consider the below query:

with x as (select
'<catalog catalog-id="manufacturer-catalog-id">
    <category-assignment category-id="category1" product-id="product1"/>
    <category-assignment category-id="category1" product-id="product2"/>
    <category-assignment category-id="category2" product-id="product3"/>
</catalog>'::xml as t
)
(
select 
    xpath('/catalog/@catalog-id', catalog_xml) catalog_id,  
    xpath('//@category-id', catalog_xml) category_assignment_category_id,
    xpath('//@product-id', catalog_xml) category_assignment_product_id
from (select unnest(xpath('/catalog', t)) catalog_xml from x) q
) 

This query returns this data:

"{manufacturer-catalog-id}";"{category1,category1,category2}";"{product1,product2,product3}"

This query:

with x as (select
'<catalog catalog-id="manufacturer-catalog-id">
    <category-assignment category-id="category1" product-id="product1"/>
    <category-assignment category-id="category1" product-id="product2"/>
    <category-assignment category-id="category2" product-id="product3"/>
</catalog>'::xml as t
)
(
select
    xpath('/catalog/@catalog-id', catalog_xml) catalog_id,  
    xpath('//@category-id', catalog_xml) category_assignment_category_id,
    xpath('//@product-id', catalog_xml) category_assignment_product_id
from (select unnest(xpath('/catalog/category-assignment', t)) catalog_xml from x) q
) 

---EDITED---

returns this data:

"{}";"{category1}";"{product1}"
"{}";"{category1}";"{product2}"
"{}";"{category2}";"{product3}"

I need this data:

"{manufacturer-catalog-id}";"{category1}";"{product1}"
"{manufacturer-catalog-id}";"{category1}";"{product2}"
"{manufacturer-catalog-id}";"{category2}";"{product3}"
Scott Wood
  • 1,077
  • 3
  • 18
  • 34

2 Answers2

5

I appeciate this question is a couple of years old, but I came here with a similar problem and believe I found an answer.

with x as (select
'<catalog catalog-id="manufacturer-catalog-id">
    <category-assignment category-id="category1" product-id="product1"/>
    <category-assignment category-id="category1" product-id="product2"/>
    <category-assignment category-id="category2" product-id="product3"/>
</catalog>'::xml as t
)
(
select 
       xpath('/catalog/@catalog-id', cat_node) catalog_id,
       xpath('/category-assignment/@category-id', cat_assn_list) category_id,
       xpath('/category-assignment/@product-id', cat_assn_list) product_id         
 from (select unnest(xpath('/catalog/category-assignment', t)) cat_assn_list, t cat_node from x) q
);

This gives

        catalog_id         | category_id | product_id
---------------------------+-------------+------------
 {manufacturer-catalog-id} | {category1} | {product1}
 {manufacturer-catalog-id} | {category1} | {product2}
 {manufacturer-catalog-id} | {category2} | {product3}
(3 rows)

This basically performs the base select which returns two columns 1) the an xpath to get assignment list (multiple rows) and 2) the original category node. The returned rows are then worked on by the higher level xpath statements - the category-id from the full category node column and the column level xpaths into the assignment list item.

I believe the OP's issue was that driving this purely off the single assignment list column means that, since postgres is returning xml nodesets at the appropriate level, rather than pointers into a single dom, the xml output returned by this is below the catalog level and that xml ndoeset cannot be traversed upwards e.g. with "ancestor::".

Hope this helps someoneelse.

Edit - I can't comment on the performance of this, as I believe the catalog-id xpath will be repeated for every assignment row within the same catalog node.

Dazed
  • 1,069
  • 9
  • 34
-1

I think you are going to have to do it in multiple stages. Here is as far as I got. You could then extract the elements here and tie them back to the parent:

with x as (select
'<catalog catalog-id="manufacturer-catalog-id">
    <category-assignment category-id="category1" product-id="product1"/>
    <category-assignment category-id="category1" product-id="product2"/>
    <category-assignment category-id="category2" product-id="product3"/>
</catalog>'::xml as t
), segs_raw as (select unnest(xpath('/catalog', t)) catalog_xml from x),
segs as (select catalog_xml, unnest(xpath('/catalog/@catalog-id', catalog_xml)) catalog_id from segs_raw)
select * from segs;

To coordinate things I think the next step is to pull the category assignment XML out along with the catalog ID, and then pull these out, so this gets you about half-way there. At leach level, you have to preserve the data you want to tabulate because otherwise you will get implicit cross-joins.

Chris Travers
  • 25,424
  • 6
  • 65
  • 182