0

I am trying to extract xml into a a table output separated by rows.

The data is a CLOB field in Oracle Database as follows:

<emailInfo>
 <recipientList>
  <recipientName>ATS</recipientName>
  <recipientEmailList>
   <emailAddress>wp@act.com.au</emailAddress>
   <statusFlag>F1AC</statusFlag>
  </recipientEmailList>
  <contactEmailList>
   <emailAddress>wp@act.com.au</emailAddress>
   <statusFlag>F1AC</statusFlag>
  <contactEmailList>
   <emailAddress>wp2@act.com.au</emailAddress>
   <statusFlag>F1AC</statusFlag>
  </contactEmailList>
  <escalationEmailList>
   <emailAddress>pw@wp.com.au</emailAddress>
   <statusFlag>F1AC</statusFlag>
  </escalationEmailList>
 </recipientList>

 <recipientList>
  <recipientName>ERG</recipientName>
  <recipientEmailList>
   <emailAddress>erg@wp.com.au</emailAddress>
   <statusFlag>F1AC</statusFlag>
  </recipientEmailList>
  <contactEmailList>
   <emailAddress>erg@wp.com.au</emailAddress>
   <statusFlag>F1AC</statusFlag>
  </contactEmailList>
  <escalationEmailList>
   <emailAddress>sl@wp.com.au</emailAddress>
   <statusFlag>F1AC</statusFlag>
  </escalationEmailList>
  <escalationEmailList>
   <emailAddress>sl2@wp.com.au</emailAddress>
   <statusFlag>F1AC</statusFlag>
  </escalationEmailList>
 </recipientList>
</emailInfo>

EDIT2: My updated SQL query is as follows:

             SELECT t.*, m.*, p.*, l.*
             FROM cisadm.F1_ext_lookup_val exval,

                  XMLTABLE ('/emailInfo/recipientList'
                     PASSING XMLTYPE (exval.bo_data_area)
                     COLUMNS recipient_name                VARCHAR2 (4000)  PATH 'recipientName',
                             recipient_email_list          XMLTYPE          PATH '/recipientEmailList',
                             contact_email_list            XMLTYPE          PATH '/contactEmailList',
                             escalation_email_list         XMLTYPE          PATH '/escalationEmailList') t,
                  XMLTABLE ('/recipientEmailList'
                     PASSING (t.recipient_email_list)
                     COLUMNS recipient_email_address       VARCHAR2 (4000)  PATH '/emailAddress',
                             rec_email_status_flg          VARCHAR2 (10)    PATH '/statusFlag') m,
                  XMLTABLE ('/contactEmailList'
                     PASSING (t.contact_email_list)
                     COLUMNS contact_email_address         VARCHAR2 (4000)  PATH 'contactEmailList/emailAddress',
                             contact_email_status_flg      VARCHAR2 (10)    PATH 'contactEmailList/statusFlag'
                             ) p,
                  XMLTABLE('/escalationEmailList'
                     PASSING (t.escalation_email_list)
                     COLUMNS     esc_email_address         VARCHAR2(4000)   PATH 'escalationEmailList/emailAddress',
                                 esc_email_status_flg      VARCHAR2(10)     PATH 'escalationEmailList/statusFlag'
                      ) l

I am trying to provision for the fact that there may be multiple values for each Recipient email list, contact email list, and escalation email list.

Sample output should be:

SampleOutput

Any help would be so appreciated!

Heisenberg
  • 267
  • 3
  • 6
  • 15
  • couple of links might help you https://community.oracle.com/thread/2239392?tstart=0 and http://stackoverflow.com/questions/18589276/xml-oracle-multiple-child-node-extract – Pravin Satav Sep 24 '15 at 05:55
  • Consider using a general purpose language like Python, PHP, C#, Java which will be scalable and maintainable as SQL is a special-purpose language and various RDMS handle XML data differently. Some weeks back, also for an Oracle question I laid out solutions in the reverse SQL->XML using [open-source](http://stackoverflow.com/questions/32424049/build-an-xml-with-xmlelement-oracle-sql-11g-query/32425800#32425800). – Parfait Sep 25 '15 at 04:03

3 Answers3

1

For future readers, here are general-purpose solutions in open-source programming to migrate XML data from a CLOB field into csv tabular format.

Using the OP's data needs, these approaches are not dependent on any RDMS and hence can be used in other database connections. Additionally, limitations of SQL are overcome as various nuances like xpaths, arrays, loops can be used:

Python (using cx_Oracle):

#!/usr/bin/python
import os
import cx_Oracle
import csv
import lxml.etree as ET

# SET DIRECTORY PATH
cd = os.path.dirname(os.path.abspath(__file__))

# DB CONNECTION AND QUERY
db = cx_Oracle.connect("uid/pwd@database")    
cur = db.cursor()
clob = cur.execute("SELECT CLOBfield FROM OracleTable").fetchone()

# CLOSE CURSOR AND DATABASE
cur.close()
db.close()

# PARSE XML CONTENT
dom = ET.fromstring(clob)

# DEFINING COLUMNS
columns = ['RECIPENT_NAME', 'RECIPIENT_EMAIL_ADDRESS', 'REC_EMAIL_STATUS_FLG',
           'CONTACT_EMAIL_ADDRESS', 'CONTACT_EMAIL_STATUS_FLG',
           'ESC_EMAIL_ADDRESS', 'ESC_EMAIL_STATUS_FLG']

emailnodes = ['recipientEmailList', 'contactEmailList', 'escalationEmailList']

# OPEN CSV FILE
with open(os.path.join(cd,'CLOB_Py.csv'), 'w', newline='') as m:
    writer = csv.writer(m)    
    writer.writerow(columns)

    nodexpath = dom.xpath('//recipientList')

    dataline = []    
    for j in range(1,len(nodexpath)+1):

        dataline = []        
        dataline.append(dom.xpath('//recipientList[{0}]/recipientName'.format(j))[0].text)

        for n in emailnodes:   
            # EMAILS
            childxpath = dom.xpath('//recipientList[{0}]/{1}[1]/*[1]'.format(j, n))            

            # APPEND DATA LINES   
            for elem in childxpath:
                dataline.append(elem.text)

            if childxpath == []:
                dataline.append('')

            # FLAGS
            childxpath = dom.xpath('//recipientList[{0}]/{1}[1]/*[2]'.format(j, n))

            # APPEND DATA LINES   
            for elem in childxpath:
                dataline.append(elem.text)

            if childxpath == []:
                dataline.append('')

        writer.writerow(dataline)

PHP (using PDO Oracle OCI)

// Set Directory Path
$cd = dirname(__FILE__);

// Opening db connection
$db_username = "your_username";
$db_password = "your_password";
$db = "oci:dbname=your_sid";

try {
    $dbh = new PDO($db,$db_username,$db_password);          
    $dbh->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);

    $sql = "SELECT CLOBfield FROM OracleTable";    
    $STH = $dbh->query($sql);    
    $clob = $STH->fetch();
}

catch(PDOException $e) {  
    echo $e->getMessage();
    exit;
}

# Closing db connection
$dbh = null;

// Loading XML source
$xpath = simplexml_load_string($clob);

// Writing column headers
$columns = array('RECIPENT_NAME', 'RECIPIENT_EMAIL_ADDRESS', 'REC_EMAIL_STATUS_FLG',
                 'CONTACT_EMAIL_ADDRESS', 'CONTACT_EMAIL_STATUS_FLG',
                 'ESC_EMAIL_ADDRESS', 'ESC_EMAIL_STATUS_FLG');

$emailnodes = array('recipientEmailList', 'contactEmailList', 'escalationEmailList');

$fs = fopen($cd.'/CLOB_PHP.csv', 'w');
fputcsv($fs, $columns);      
fclose($fs);    

// Writing data lines
$i = 1;
$values = [];
$node = $xpath->xpath('//recipientList');    

foreach ($node as $n){

     $child = $xpath->xpath('//recipientList['. $i .']/recipientName');
     foreach($child as $value) {            
          $values[] = $value;         
     }

     foreach ($emailnodes as $e){

          // EMAILS       
          $child = $xpath->xpath('//recipientList['. $i .']/'. $e.'[1]/*[1]');

          if (count($child) > 0) {
              foreach($child as $value) {           
                 $values[] = $value;         
              }
          }   
          else {
                 $values[] = '';
          }

          // FLAGS
          $child = $xpath->xpath('//recipientList['. $i .']/'. $e.'[1]/*[2]');

          if (count($child) > 0) {
              foreach($child as $value) {           
                 $values[] = $value;         
              }
          }   
          else {
                 $values[] = '';
          }
     }  

     $fs = fopen($cd.'/CLOB_PHP.csv', 'a');
     fputcsv($fs, $values);      
     fclose($fs);  

     $values = [];
     $i++;

}

R (using ROracle):

library(XML)
library(ROracle)

setwd("C:\\Path\\To\\R\\Script")

# OPEN DATABASE AND QUERY
conn <-dbConnect(drv, username = "", password = "", dbname = "")
clobdf <- dbGetQuery(conn, "SELECT CLOBfield FROM OracleTable;")
dbDisconnect(conn)

# READ IN EXTERNAL DATA FILE
doc<-xmlParse(clobdf[[1,1]])

emailnodes <- c('recipientEmailList', 'contactEmailList', 'escalationEmailList')

# EXTRACT NODE VALUES INTO LISTS
recipientNamesList <- xpathSApply(doc, paste0("//recipientList/recipientName"), xmlValue)

for (e in emailnodes){
    assign(e, xpathSApply(doc, paste0("//recipientList/", e, "[1]/*[1]"), xmlValue))
}

for (e in emailnodes){
  assign(paste0(e, "flg"), xpathSApply(doc, paste0("//recipientList/", e, "[1]/*[2]"), xmlValue))
}

# COMBINE LISTS TO DATA FRAME
xmldf<- data.frame(RECIPENT_NAME =  matrix(unlist(recipientNamesList), nrow=2, byrow=T),
                   RECIPIENT_EMAIL_ADDRESS = matrix(unlist(recipientEmailList), nrow=2, byrow=T),
                   REC_EMAIL_STATUS_FLG  = matrix(unlist(recipientEmailListflg), nrow=2, byrow=T),
                   CONTACT_EMAIL_ADDRESS = matrix(unlist(contactEmailList),   nrow=2, byrow=T),                
                   CONTACT_EMAIL_STATUS_FLG = matrix(unlist(contactEmailListflg),   nrow=2, byrow=T),                
                   ESC_EMAIL_ADDRESS = matrix(unlist(escalationEmailList), nrow=2, byrow=T),
                   ESC_EMAIL_STATUS_FLG = matrix(unlist(escalationEmailListflg), nrow=2, byrow=T))   

# OUTPUT TO CSV
write.csv(xmldf, "CLOB_R.csv", na = "", row.names=FALSE)
Parfait
  • 104,375
  • 17
  • 94
  • 125
  • +1 for efford but you could have put the xml data from the clob into a xmltype field: UPDATE TABLE SET xmltypefield = XMLTYPE ( ( SELECT elefantclobxml from africa ) ) – Jochen Schultz Mar 30 '16 at 09:02
0

This query returns the data as in screenshot -

select 
    extractvalue(s.column_value, '/*/recipientName') as recipient_name,
    extractvalue(s.column_value, '/*/recipientEmailList/emailAddress') as recipient_email_address,
    extractvalue(s.column_value, '/*/recipientEmailList/statusFlag') as rec_email_status_flg,
    extractvalue(s.column_value, '/*/contactEmailList/emailAddress') as contact_email_address,
    extractvalue(s.column_value, '/*/contactEmailList/statusFlag') as contact_email_status_flg,
    extractvalue(s.column_value, '/*/escalationEmailList/emailAddress') as esc_email_address,
    extractvalue(s.column_value, '/*/escalationEmailList/statusFlag') as esc_email_status_flg
from  tmp, table(xmlsequence(EXTRACT(XMLTYPE(tmp.bo_data_area), '/emailInfo/recipientList'))) s

and this query extract each email on a separate line -

select recipient_name, email_address, status_flag
 from
(
    select 
           recipient_name,
           extractvalue(x.column_value, '/*/emailAddress') as email_address,
           extractvalue(x.column_value, '/*/statusFlag') as status_flag
    from
    (
        select 
            extractvalue(s.column_value, '/*/recipientName') as recipient_name,
            EXTRACT(s.column_value, '/*') recipients
        from  tmp, table(xmlsequence(EXTRACT(XMLTYPE(tmp.bo_data_area), '/emailInfo/recipientList'))) s
    ) v, table(xmlsequence(EXTRACT(v.recipients, '/*/*'))) x
)
where (email_address is not null or status_flag is not null)
Stawros
  • 918
  • 1
  • 10
  • 20
0

You may try xmltable

SELECT *
    FROM XMLTable('/emailInfo/recipientList' PASSING XMLTYPE('<emailInfo>
 <recipientList>
  <recipientName>ATS</recipientName>
  <recipientEmailList>
   <emailAddress>wp@act.com.au</emailAddress>
   <statusFlag>F1AC</statusFlag>
  </recipientEmailList>
  <contactEmailList>
   <emailAddress>wp@act.com.au</emailAddress>
   <statusFlag>F1AC</statusFlag>
  </contactEmailList>
  <escalationEmailList>
   <emailAddress>pw@wp.com.au</emailAddress>
   <statusFlag>F1AC</statusFlag>
  </escalationEmailList>
 </recipientList>

 <recipientList>
  <recipientName>ERG</recipientName>
  <recipientEmailList>
   <emailAddress>erg@wp.com.au</emailAddress>
   <statusFlag>F1AC</statusFlag>
  </recipientEmailList>
  <contactEmailList>
   <emailAddress>erg@wp.com.au</emailAddress>
   <statusFlag>F1AC</statusFlag>
  </contactEmailList>
  <escalationEmailList>
   <emailAddress>sl@wp.com.au</emailAddress>
   <statusFlag>F1AC</statusFlag>
  </escalationEmailList>
 </recipientList>
</emailInfo>')
                  COLUMNS recipient_name            VARCHAR2(4000)   PATH 'recipientName',
                          recipient_email_address   VARCHAR2(4000)   PATH 'recipientEmailList/emailAddress',
                          rec_email_status_flg      VARCHAR2(10)     PATH 'recipientEmailList/statusFlag',
                          contact_email_address     VARCHAR2(4000)   PATH 'contactEmailList/emailAddress',
                          contact_email_status_flg  VARCHAR2(10)     PATH 'contactEmailList/statusFlag',
                          esc_email_address         VARCHAR2(4000)   PATH 'escalationEmailList/emailAddress',
                          esc_email_status_flg      VARCHAR2(10)     PATH 'escalationEmailList/statusFlag'
) t

Same from table

SELECT *
    FROM tmp,XMLTable('/emailInfo/recipientList' PASSING XMLTYPE(tmp.bo_data_area)
                  COLUMNS recipient_name            VARCHAR2(4000)   PATH 'recipientName',
                          recipient_email_address   VARCHAR2(4000)   PATH 'recipientEmailList/emailAddress',
                          rec_email_status_flg      VARCHAR2(10)     PATH 'recipientEmailList/statusFlag',
                          contact_email_address     VARCHAR2(4000)   PATH 'contactEmailList/emailAddress',
                          contact_email_status_flg  VARCHAR2(10)     PATH 'contactEmailList/statusFlag',
                          esc_email_address         VARCHAR2(4000)   PATH 'escalationEmailList/emailAddress',
                          esc_email_status_flg      VARCHAR2(10)     PATH 'escalationEmailList/statusFlag'
) t
  • This yields: ORA-19279: XPTY0004 - XQuery dynamic type mismatch: expected singleton sequence - got multi-item sequence. My guess is because some names have multiple email addresses? How do I resolve this issue? – Heisenberg Sep 24 '15 at 07:18
  • Please post XML example to let me advice something specific – Anton Zaviriukhin Sep 24 '15 at 07:31
  • I have updated the XML example to include multiple email addresses per Recipient ('Escalation Email List', 'Contact Email List'). Thanks! – Heisenberg Sep 25 '15 at 01:42