0

I have following string (you can say xml)

<News News-type="alert" ID="498" NewsPath="GetNewsFrom[3]" NewsMark="0" />
<News News-type="alert" ID="1507" NewsPath="GetNewsFrom[3]" NewsMark="0"/>
<News News-type="alert" ID="1509" NewsPath="GetNewsFrom[3]" NewsMark="0"/>
<News News-type="alert" ID="1511" NewsPath="GetNewsFrom[3]" NewsMark="0" />
<News News-type="alert" ID="1520" NewsPath="GetNewsFrom[3]" NewsMark="0" />
<News News-type="alert" ID="2999" NewsPath="data-theft[1]" NewsMark="0" />
<News News-type="alert" ID="2535" NewsPath="GetNewsFrom[3]" NewsMark="0" />
<News News-type="alert" ID="6052" NewsPath="GetNewsFrom[3]" NewsMark="100" />

I am unable to apply xml reader/ parser on it they said it's not a well form xml file. Can you please help me how can I get following output from these strings

String attr[4]={"News-type","ID", "NewsPath", "NewsMark"};
String values[4];
//There values dynamically in array as well 
int i;
for(i=0; i<4;i++)
{
    if(i==0)
        value[i]=????;
    else if(i==1)
    ...
}

How can I get all attribute values in values[] array so that I can use it further .

Exception:
while passing it as xml file in java [Fatal Error] :2:2: The markup in the document following the root element must be well-formed. Mar 18, 2014 11:43:21 AM GUI.NewsReport jMenuItem2ActionPerformed SEVERE: null org.xml.sax.SAXParseException; lineNumber: 2; columnNumber: 2; The markup in the document following the root element must be well-formed. at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:257) at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:347) at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:121) at GUI.NewsReport.ReadXML(NewsReport.java:185) at GUI.NewsReport.jMenuItem2ActionPerformed(NewsReport.java:126) at GUI.NewsReport.access$100(NewsReport.java:33) at GUI.NewsReport$2.actionPerformed(NewsReport.java:88) at javax.swing.AbstractButton.fireActionPerformed(AbstractButton.java:2018) at javax.swing.AbstractButton$Handler.actionPerformed(AbstractButton.java:2341) at javax.swing.DefaultButtonModel.fireActionPerformed(DefaultButtonModel.java:402) at javax.swing.DefaultButtonModel.setPressed(DefaultButtonModel.java:259) at javax.swing.AbstractButton.doClick(AbstractButton.java:376) at javax.swing.plaf.basic.BasicMenuItemUI.doClick(BasicMenuItemUI.java:833) at javax.swing.plaf.basic.BasicMenuItemUI$Handler.mouseReleased(BasicMenuItemUI.java:877) at java.awt.Component.processMouseEvent(Component.java:6505) at javax.swing.JComponent.processMouseEvent(JComponent.java:3320) at java.awt.Component.processEvent(Component.java:6270) at java.awt.Container.processEvent(Container.java:2229) at java.awt.Component.dispatchEventImpl(Component.java:4861) at java.awt.Container.dispatchEventImpl(Container.java:2287) at java.awt.Component.dispatchEvent(Component.java:4687) at java.awt.LightweightDispatcher.retargetMouseEvent(Container.java:4832) at java.awt.LightweightDispatcher.processMouseEvent(Container.java:4492) at java.awt.LightweightDispatcher.dispatchEvent(Container.java:4422) at java.awt.Container.dispatchEventImpl(Container.java:2273) at java.awt.Window.dispatchEventImpl(Window.java:2719) at java.awt.Component.dispatchEvent(Component.java:4687) at java.awt.EventQueue.dispatchEventImpl(EventQueue.java:735) at java.awt.EventQueue.access$200(EventQueue.java:103) at java.awt.EventQueue$3.run(EventQueue.java:694) at java.awt.EventQueue$3.run(EventQueue.java:692) at java.security.AccessController.doPrivileged(Native Method) at java.security.ProtectionDomain$1.doIntersectionPrivilege(ProtectionDomain.java:76) at java.security.ProtectionDomain$1.doIntersectionPrivilege(ProtectionDomain.java:87) at java.awt.EventQueue$4.run(EventQueue.java:708) at java.awt.EventQueue$4.run(EventQueue.java:706) at java.security.AccessController.doPrivileged(Native Method) at java.security.ProtectionDomain$1.doIntersectionPrivilege(ProtectionDomain.java:76) at java.awt.EventQueue.dispatchEvent(EventQueue.java:705) at java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:242) at java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:161) at java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:150) at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:146) at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:138) at java.awt.EventDispatchThread.run(EventDispatchThread.java:91)` I got this exception while executing ..

  • Thanks allot!
  • What about using regexp? This text seems to have easy to match patterns. I will try to write one in answer. – Leos Literak Mar 17 '14 at 12:54
  • 2
    What said it's not well formed xml? – Alexis Leclerc Mar 17 '14 at 12:57
  • 1
    @LeosLiterak Please do not use regex to parse XML. Here are some examples of why it is a bad plan. [LINK](http://stackoverflow.com/questions/701166/can-you-provide-some-examples-of-why-it-is-hard-to-parse-xml-and-html-with-a-reg) – Dan Temple Mar 17 '14 at 12:57
  • Can't you just make the input well formed? E.g. you could just append and at the start respectively at the end of your input. – Steven Pessall Mar 17 '14 at 12:59
  • The reason it's not well formed is because you have multiple root nodes - the `News` elements. Either treat each `News` element as an individual document or wrap the lot with an arbitrary `root` element – Nick Holt Mar 17 '14 at 12:59
  • @AlexisLeclerc I have used in code in exception it told me that It's not a well formed xml, Although I have valided with online website http://www.xmlvalidation.com/ It also returning same error. `The markup in the document following the root element must be well-formed.` – Muhammad Abdullah Mar 17 '14 at 13:00
  • 1
    @NickHolt I have tried with appending ` ` in start and end but no success for me. – Muhammad Abdullah Mar 17 '14 at 13:02
  • @MuhammadAbdullah: What do you mean by "no success"? That should hvae parsed. If it didn't, either your code or your data has problems not shown here and you need to give us more specifics. – keshlam Mar 17 '14 at 13:12

3 Answers3

3

There's no single root element, so it isn't a well-formed XML document... though it may be a well-formed XML Document Fragment.

If the latter, is true, the simplest solution for parsing it in Java is to implement a modified reader which wraps a dummy top-level element around it -- precede the content with <wrapper> and follow it with </wrapper>, for example. Then implement the rest of your application with an awareness that the <wrapper> is not part of the original file content.

keshlam
  • 7,931
  • 2
  • 19
  • 33
1

Well the simple way to solve this in this case is to add a parent tag to all the News tags and then parse it like any other xml.

<NewsParent>
<News News-type="alert" ID="498" NewsPath="GetNewsFrom[3]" NewsMark="0" />
<News News-type="alert" ID="1507" NewsPath="GetNewsFrom[3]" NewsMark="0"/>
<News News-type="alert" ID="1509" NewsPath="GetNewsFrom[3]" NewsMark="0"/>
<News News-type="alert" ID="1511" NewsPath="GetNewsFrom[3]" NewsMark="0" />
<News News-type="alert" ID="1520" NewsPath="GetNewsFrom[3]" NewsMark="0" />
<News News-type="alert" ID="2999" NewsPath="data-theft[1]" NewsMark="0" />
<News News-type="alert" ID="2535" NewsPath="GetNewsFrom[3]" NewsMark="0" />
<News News-type="alert" ID="6052" NewsPath="GetNewsFrom[3]" NewsMark="100" />
</NewsParent>
anirudh
  • 4,116
  • 2
  • 20
  • 35
0

Other than doing some pre-processing (this should be better than regex), your other alternative would be to use a regular expression such as this: News-type=\\"([^\\"]+?)\\"\\s+ID=\\"([^\\"]+?)\\"\\s+NewsPath=\\"([^\\"]+?)\\"\\s+NewsMark=\\"([^\\"]+?)\\".

The above regex should match what you are after and place it in groups you can later access.

An explanation of the regex is available here.

npinti
  • 51,780
  • 5
  • 72
  • 96