I'm trying to create an Hive table out of XML file using com.ibm.spss.hive.serde2.xml.XmlSerDe. This works quite good for the tags with single occurrence. But I've an issue with multiple occurrence.
Below is my source XML.
<Item>
<TimeStamp>2016-02-19T12:27:06.387Z</TimeStamp>
<AlsoSeen End="2014-08-21T13:44:32.557Z" Start="2014-08-21T13:44:04.637Z" />
<AlsoSeen End="2014-08-21T13:44:33.557Z" Start="2014-08-21T13:45:04.637Z" />
<AlsoSeen End="2014-08-21T13:44:34.557Z" Start="2014-08-21T13:46:04.637Z" />
<Title ID="112031424">FAULT IN OUR STARS, THE</Title>
<FileName>The Fault in Our Stars (2014) EXTENDED HDRip x264 AAC-CPG</FileName>
</Item>
Below is my Hive Table DDL
add jar hivexmlserde-1.0.5.3.jar;
CREATE EXTERNAL TABLE xml_test
(
Item_TimeStamp String
,Item_AS_Start String
,Item_AS_End String
,Item_Title String
,Item_ID String
,Item_Artist String
,Item_Author String
,Item_FileName String
)
ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe'
WITH SERDEPROPERTIES (
"column.xpath.Item_TimeStamp"="/Item/TimeStamp/text()",
"column.xpath.Item_AS_Start"="/Item/AlsoSeen/@Start",
"column.xpath.Item_AS_End"="/Item/AlsoSeen/@End",
"column.xpath.Item_Title"="/Item/Title/text()",
"column.xpath.Item_ID"="/Item/Title/@ID",
"column.xpath.Item_FileName"="/Item/FileName/text()"
)
STORED AS
INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
LOCATION '/user/xxxxxx/XML_text'
TBLPROPERTIES (
"xmlinput.start"="<Item>",
"xmlinput.end"="</Item>"
);
When I query the table I get the below value
2016-02-19T12:27:06.387Z <string>2014-08-21T13:44:04.637Z2014-08-21T13:45:04.637Z2014-08-21T13:46:04.637Z</string> <string>2014-08-21T13:44:32.557Z2014-08-21T13:44:33.557Z2014-08-21T13:44:34.557Z</string> FAULT IN OUR STARS, THE 112031424 The Fault in Our Stars (2014) EXTENDED HDRip x264 AAC-CPG
But I expected output should be 3 different lines for 3 difference occurances of AlsoSeen tag as below
2016-02-19T12:27:06.387Z 2014-08-21T13:44:04.637Z 2014-08-21T13:44:32.557Z FAULT IN OUR STARS, THE 112031424 The Fault in Our Stars (2014) EXTENDED HDRip x264 AAC-CPG
2016-02-19T12:27:06.387Z 2014-08-21T13:45:04.637Z 2014-08-21T13:44:33.557Z FAULT IN OUR STARS, THE 112031424 The Fault in Our Stars (2014) EXTENDED HDRip x264 AAC-CPG
2016-02-19T12:27:06.387Z 2014-08-21T13:46:04.637Z 2014-08-21T13:44:34.557Z FAULT IN OUR STARS, THE 112031424 The Fault in Our Stars (2014) EXTENDED HDRip x264 AAC-CPG
Can anyone please help me?