0

I am trying to use the DocumentBuilder and parse one large file that contains multiple documents. When I am running my program, I am getting this error: "The markup in the document following the root element must be well-formed."

I think it's because there is no actual root on my document and it's a TextEdit Document , the structure is the following:

<DOC>
<DOCNO> AP890106-0001 </DOCNO>
<FILEID>AP-NR-01-06-89 0033EST</FILEID>
<FIRST>r a PM-BRF--Heidnik     01-06 0136</FIRST>
<SECOND>PM-BRF--Heidnik,0139</SECOND>
<HEAD>Torture-Murderer In Fair Condition, Conscious</HEAD>
<DATELINE>PITTSBURGH (AP) </DATELINE>
<TEXT>
   Convicted torture-murderer Gary Heidnik has
regained consciousness after apparently attempting suicide in his
prison cell with a drug overdose, prison officials said.
   Heidnik's condition was upgraded to fair Thursday, but he
remained under tight security in the intensive care unit of West
Penn Hospital, said Tom Seiverling, a spokesman for the State
Correctional Institution at Pittsburgh.
   Heidnik, 45, was semi-comatose earlier this week after being
found unconscious in his cell Sunday. Prison officials believe
Heidnik stored up medications that were prescribed for him by
pretending to take them at the designated times.
   The self-proclaimed minister faces the death sentence for the
slayings of two of six women he kept chained in the basement of his
Philadelphia row house. He was convicted and sentenced last July.
</TEXT>
</DOC>
<DOC>
<DOCNO> AP890106-0002 </DOCNO>
<FILEID>AP-NR-01-06-89 0524EST</FILEID>
<FIRST>d a PM-BRF--DrivingToddler     01-06 0162</FIRST>
<SECOND>PM-BRF--Driving Toddler,0166</SECOND>
<HEAD>3-Year-Old Takes Careening First Drive; Emerges Unharmed</HEAD>
<DATELINE>CAZENOVIA, N.Y. (AP) </DATELINE>
<TEXT>
   Going out to buy a puppy, Cecilia Kaler
placed her three-year-old son in a child seat, left the car running
and got out to clear snow from the windshield. She never finished
the job.
   As soon as his mother closed the door, little Michael Kaler
locked it, put the car in drive, and rode away Wednesday. The car
went down the driveway, across a busy road, narrowly missed a tree
and fire hydrant, rolled on its side down an embankment and finally
came to rest in a creek.
   Michael was wet, cold and otherwise unharmed, said Kaler, a
resident of this community 15 miles southeast of Syracuse.
   A nearby man heard Kaler screaming and rushed over. He smashed a
window and freed little Michael.
   ``Anybody who says there's no God doesn't know what they're
talking about, because someone certainly was looking out for him,''
Kaler said Thursday.
</TEXT>
</DOC>

I would like to separate each document by tagNames <DOC> and </DOC>

my code so far:

 DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
  DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();

  Document doc = dBuilder.parse(document);
  doc.getElementsByTagName("doc").toString();
Andreas
  • 154,647
  • 11
  • 152
  • 247
James
  • 31
  • 1
  • 5
  • An XML document can have only one root element. You have two, so your XML document is malformed. If you need a list of `` elements, you should create a root element called something like `` to contain them. – Andreas Feb 10 '20 at 20:16

1 Answers1

0

Parsing of your file is Not possible because there is no "unique" root element. Your

<doc> </doc> 

blocks must be surrounded with another tag container: choose the name that you like. Then, when xml is well formed, you can try to parse.

Example:

<mytag>
    <doc> ........</doc> 
    <doc>........... </doc> 
</mytag>
abatti
  • 155
  • 7