Questions tagged [cyberneko]

15 questions

votes

1 answer

Cleaning mixed type and So what happens is this : given
Some Text
, neko…

java html-sanitizing cyberneko

asked Apr 26 '10 at 12:39

Yossale

14,165
22
82
109

votes

1 answer

ClassName.class.getResourceAsStream returning Null

I migrated project from Eclipse to Android Studio. App compiles fine, but it has a crash related to nekohtml library. Inside HTMLEntities class //filename = "res/HTMLlat1.properties" final InputStream stream =…

java android android-studio cyberneko

asked Jun 08 '14 at 11:32

Minas

1,422
16
29

vote

1 answer

How to fix a cyberneko self closing iframe is not recognizeable in htmlunit?

I am currently trying to make a web scraping program by using HTMLunit. However, when i ran it i receive this error Exception in thread "main" com.gargoylesoftware.htmlunit.ObjectInstantiationException: unable to create HTML parser at…

java htmlunit cyberneko

asked May 11 '19 at 08:24

Gagak

vote

1 answer

serialize a NekoHTML ElementNSImpl object back to HTML/XML

Does anyone know if there is a straightforward way to serialize a parsed cyberneko ElementNSImpl object? Here is my example in Clojure of serializing the whole DOM (an HTMLDocumentImpl object). This works, but I have not yet figured out how to do…

java clojure cyberneko

asked Oct 11 '10 at 19:10

rplevy

5,393
3
32
31

vote

2 answers

Parsing html with cyberneko to find a 'div'-tag

I need one specific 'div'-tag (identified by 'id') from a html site. To parse the page I'm using cyberneko. def doc = new XmlParser( new org.cyberneko.html.parsers.SAXParser() ).parse(htmlFile) divTag = doc.depthFirst().DIV.find{ it['@id']…

java xml groovy cyberneko

asked Dec 29 '09 at 13:26

domi

2,167
1
28
45

votes

1 answer

Comments getting escaped with NekoHTML (or JTidy) + XOM

I'm using NekoHTML to clean up some HTML, and then feeding it to XOM to get an object model. Somewhere in the course of this, comments are getting escaped. Here's a relevant example of the input HTML (most of the cut for clarity):

jtidy xom cyberneko

asked Nov 16 '11 at 19:28

David Moles

48,006
27
136
235

votes

1 answer

Parse html document with NekoHTML

I am using NekoHTML framework with xerces 2.11.0 version to parse an HTML document. But i am having a problem with this simple code : DOMParser parser = new DOMParser(); System.out.println(parser.getClass().toString()); InputSource url = new…

java html parsing cyberneko

asked Oct 11 '11 at 16:25

tt0686

1,771
6
31
60

votes

1 answer

cyberneko html settings to ignore unencoded greater than and less than symbol

I'm having htmlcontent which contains greater than and less than symbol. But those symbols are not encoded as < and >. To balance tags in the content i pass the content through cyberneko html parser. After parsing content in between those…

java cyberneko

asked Mar 29 '11 at 09:56

Roshan

2,019
8
36
56

votes

1 answer

Processing XML comments using SAX & Cyberneko - in DOM order

I'm using cyberneko to clean and process html documents. I need to be able to process all the comments that occur in the original html documents. I've configured the cyberneko sax parser to process comments like…

xml sax cyberneko

asked Jan 15 '11 at 13:28

Joel

29,538
35
110
138

votes

1 answer

Groovy: CyberNeko | User Agents | Browser Version

I'm currently using CyberNeko in an attempt to grab information I want from a website. However, I believe the website checks the user agent/browser version to keep from just grabbing the url content. I am aware of using htmlunit to change the…

html browser groovy version cyberneko

asked Nov 23 '10 at 22:27

StartingGroovy

2,802
9
47
66

votes

1 answer

XmlSlurper/NekoHTML document fragment parsing - No HTML or BODY tags wanted

Dear All, I am trying to parse the following HTML fragment, and I would like to get the same fragment as output (without HTML and BODY tags). Is this possible? If so, how? Thank you Misha p.s. I am reading…

groovy fragment xmlslurper cyberneko

asked Jun 11 '10 at 16:31

Миша Кошелев

1,483
1
24
41

votes

1 answer

XmlUtil.serialize : Outputs tags in uppercase

I am trying to create a valid html document from html String content = getContent() def parser = new org.cyberneko.html.parsers.SAXParser() parser.setFeature('http://xml.org/sax/features/namespaces', false) def slurper = new…

groovy xmlslurper cyberneko

asked May 20 '14 at 14:09

Sudhir N

4,008
1
22
32

votes

1 answer

How to get html content using CyberNeko?

def page = new XmlSlurper(new SAXParser()).parse(url) println page.body[0] I want output

Header

where my html is: Title …

html groovy xmlslurper cyberneko

asked Mar 28 '14 at 13:45

Xelian

16,680
25
99
152

votes

0 answers

Parsing html string to dom document using cyberneko

I'm trying to parse a html string to a w3c dom document using neko html but my document is always null. This I the code is use: try { String html = readFile("C:/Users/thomas/Desktop/test.html"); InputStream is = new…

java html-parsing domdocument cyberneko

asked Dec 10 '12 at 08:28

thommie

votes

2 answers

When using HtmlUnit, how can I configure the underlying NekoHtml parser?

I'm using HtmlUnit to try and scrape a webpage because of it's Javascript support. (I'd rather use Jsoup, but no JS support). The issue relates to a feature of the underlying NekoHtml parser: …

java htmlunit cyberneko

asked Jun 21 '12 at 13:08

Erik