Questions tagged [cyberneko]
15 questions
2
votes
1 answer

Yossale
- 14,165
- 22
- 82
- 109
2
votes
1 answer
ClassName.class.getResourceAsStream returning Null
I migrated project from Eclipse to Android Studio.
App compiles fine, but it has a crash related to nekohtml library.
Inside HTMLEntities class
//filename = "res/HTMLlat1.properties"
final InputStream stream =…

Minas
- 1,422
- 16
- 29
1
vote
1 answer
How to fix a cyberneko self closing iframe is not recognizeable in htmlunit?
I am currently trying to make a web scraping program by using HTMLunit. However, when i ran it i receive this error
Exception in thread "main" com.gargoylesoftware.htmlunit.ObjectInstantiationException: unable to create HTML parser
at…

Gagak
- 129
- 2
- 13
1
vote
1 answer
serialize a NekoHTML ElementNSImpl object back to HTML/XML
Does anyone know if there is a straightforward way to serialize a parsed cyberneko ElementNSImpl object?
Here is my example in Clojure of serializing the whole DOM (an HTMLDocumentImpl object). This works, but I have not yet figured out how to do…

rplevy
- 5,393
- 3
- 32
- 31
1
vote
2 answers
Parsing html with cyberneko to find a 'div'-tag
I need one specific 'div'-tag (identified by 'id') from a html site. To parse the page
I'm using cyberneko.
def doc = new XmlParser( new org.cyberneko.html.parsers.SAXParser() ).parse(htmlFile)
divTag = doc.depthFirst().DIV.find{ it['@id']…

domi
- 2,167
- 1
- 28
- 45
0
votes
1 answer
Comments getting escaped with NekoHTML (or JTidy) + XOM
I'm using NekoHTML to clean up some HTML, and then feeding it to XOM to get an object model. Somewhere in the course of this, comments are getting escaped.
Here's a relevant example of the input HTML (most of the cut for clarity):

David Moles
- 48,006
- 27
- 136
- 235
0
votes
1 answer
Parse html document with NekoHTML
I am using NekoHTML framework with xerces 2.11.0 version to parse an HTML document.
But i am having a problem with this simple code :
DOMParser parser = new DOMParser();
System.out.println(parser.getClass().toString());
InputSource url = new…

tt0686
- 1,771
- 6
- 31
- 60
0
votes
1 answer
cyberneko html settings to ignore unencoded greater than and less than symbol
I'm having htmlcontent which contains greater than and less than symbol. But those symbols are not encoded as < and >. To balance tags in the content i pass the content through cyberneko html parser. After parsing content in between those…

Roshan
- 2,019
- 8
- 36
- 56
0
votes
1 answer
Processing XML comments using SAX & Cyberneko - in DOM order
I'm using cyberneko to clean and process html documents.
I need to be able to process all the comments that occur in the original html documents.
I've configured the cyberneko sax parser to process comments like…

Joel
- 29,538
- 35
- 110
- 138
0
votes
1 answer
Groovy: CyberNeko | User Agents | Browser Version
I'm currently using CyberNeko in an attempt to grab information I want from a website. However, I believe the website checks the user agent/browser version to keep from just grabbing the url content.
I am aware of using htmlunit to change the…

StartingGroovy
- 2,802
- 9
- 47
- 66
0
votes
1 answer
XmlSlurper/NekoHTML document fragment parsing - No HTML or BODY tags wanted
Dear All, I am trying to parse the following HTML fragment, and I would like to get the same fragment as output (without HTML and BODY tags). Is this possible? If so, how?
Thank you
Misha
p.s. I am reading…

Миша Кошелев
- 1,483
- 1
- 24
- 41
0
votes
1 answer
XmlUtil.serialize : Outputs tags in uppercase
I am trying to create a valid html document from html
String content = getContent()
def parser = new org.cyberneko.html.parsers.SAXParser()
parser.setFeature('http://xml.org/sax/features/namespaces', false)
def slurper = new…

Sudhir N
- 4,008
- 1
- 22
- 32
0
votes
1 answer
How to get html content using CyberNeko?
def page = new XmlSlurper(new SAXParser()).parse(url)
println page.body[0]
I want output
Title
…
Header
where my html is:
Xelian
- 16,680
- 25
- 99
- 152
0
votes
0 answers
Parsing html string to dom document using cyberneko
I'm trying to parse a html string to a w3c dom document using neko html but my document is always null. This I the code is use:
try {
String html = readFile("C:/Users/thomas/Desktop/test.html");
InputStream is = new…

thommie
- 438
- 5
- 22
0
votes
2 answers
When using HtmlUnit, how can I configure the underlying NekoHtml parser?
I'm using HtmlUnit to try and scrape a webpage because of it's Javascript support. (I'd rather use Jsoup, but no JS support).
The issue relates to a feature of the underlying NekoHtml parser:
…

Erik
- 997
- 4
- 14
- 24