parse text from xml

Question

I have following link

https://hero.epa.gov/hero/ws/swift.cfc?method=getProjectRIS&project_id=993&getallabstracts=true

I want to parse this xml to get only text, like

Provider: HERO - 2.xx
DBvendor=EPA
Text-encoding=UTF-8

How can I parse it ?

If you view the source code of that page, you'll see the [wddxPacket](https://en.wikipedia.org/wiki/WDDX) you mentioned. You *might* be able to [parse it as XML](https://stackoverflow.com/questions/3962866/what-is-the-easiest-way-to-extract-plain-text-from-an-xml-document)... though I haven't tried. — showdev, May 23 '17 at 18:34
you can install ARC ([Advanced Rest Client](https://chrome.google.com/webstore/detail/advanced-rest-client/hgmloofddffdnphfgcellkdfbfbjeloo?utm_source=chrome-app-launcher-info-dialog)) from the chrome webstore to get more influence into the headers sent and see the request and response headers and content. — cyberbrain, May 23 '17 at 18:55

score 2 · Accepted Answer · answered May 23 '17 at 18:35

2

Well, it's not a text file, it's an HTML file. If you open a file in browser and select view source you will be able to see text enclosed in <char> tags.

When it's opened in browser, these tags and other HTML content is interpreted and output is rendered on the page (that's why it looks like a text). If you want to implement similar behavior in Java then you should look into PhantomJS and/or JSoup examples.

answered May 23 '17 at 18:35

Darshan Mehta

30,102
11
68
102

I need to parse this file. It there any easy way to do it? Like converting the file into xml/json, etc.? – user1631306 May 23 '17 at 19:24
I guess JSoup will be your best friend in this case – Darshan Mehta May 24 '17 at 21:03

score 0 · Answer 2 · answered May 23 '17 at 18:34

0

It looks like a text file but it is an XML file and the browser just displays its text content. To verify right click and look at the page source.

answered May 23 '17 at 18:34

wero

32,544
3
59
84

score 0 · Answer 3 · answered May 24 '17 at 17:21

0

You can use a library like Jsoup for parsing the file and getting the contents.

https://jsoup.org/cookbook/introduction/parsing-a-document

answered May 24 '17 at 17:21

Metalhead

1,429
3
15
34

parse text from xml

3 Answers3