0

I'm trying to query big XML loaded into Hive Table

CREATE TABLE test (
xmlfile STRING
);

Full XML is here: http://bpaste.net/show/178819/

<dataroot>
    <AccessPoint>
       <Denominazione>Piazza G.Garibaldi</Denominazione>
       <Latitudine>41.9607</Latitudine>
       <longitudine>12.7963</longitudine>
       <Indirizzo>Piazza G.Garibaldi</Indirizzo>
       <Comune>Tivoli</Comune>
       <Tipologia>Privati federati</Tipologia>
    </AccessPoint>
    <AccessPoint>
       <Denominazione>Piazza Tempio D&apos;Ercole</Denominazione>
       <Latitudine>41.9653</Latitudine>
       <longitudine>12.7977</longitudine>
       <Indirizzo>Piazza Tempio D&apos;Ercole</Indirizzo>
       <Comune>Tivoli</Comune>
       <Tipologia>Privati federati</Tipologia>
    </AccessPoint>
    ...
</dataroot>

So I'm trying to query with

SELECT XPATH(xmlfile,'dataroot/AccessPoint/Denominazione/text()') FROM test;
...
Job running in-process (local Hadoop)
[Fatal Error] :1:1: Content is not allowed in prolog.
Adriano Foschi
  • 648
  • 1
  • 8
  • 23
  • possible duplicate of [org.xml.sax.SAXParseException: Content is not allowed in prolog](http://stackoverflow.com/questions/5138696/org-xml-sax-saxparseexception-content-is-not-allowed-in-prolog) – Raedwald Jul 18 '14 at 10:20

1 Answers1

0

Solved. There were two problems:

  1. XML content was bad-formed

    [bash]$ cat -A xmlfile
    M-oM-;M-?<?xml version="1.0" encoding="UTF-8"?>^M$
    ...
    
  2. XML must be all on one row. I did it quicky with :%j on vim

Adriano Foschi
  • 648
  • 1
  • 8
  • 23