Questions tagged [tag-soup]

TagSoup is a SAX-compliant parser written in Java that parses HTML as it is found in the wild.

38 questions

votes

5 answers

How to get an attribute from an XMLReader

I have some HTML that I'm converting to a Spanned using Html.fromHtml(...), and I have a custom tag that I'm using in it: So I've implemented a TagHandler to handle this custom tag, like so: public void handleTag( boolean…

asked Aug 05 '11 at 06:22

Jason Robinson

31,005
19
77
131

votes

1 answer

jTidy and TagSoup documentation

I'm looking for documentation (officially documentation if it is possible) for TagSoup and jTidy libraries. I want use this libraries to manipulate html "tagsoup" files that include xml tags with different namespaces mixed between html (html, xhtml…

java jtidy tag-soup jericho-html-parser

asked Dec 15 '10 at 16:49

angelcervera

3,699
1
40
68

votes

3 answers

Tagsoup fails to parse html document from a StringReader ( java )

I have this function: private Node getDOM(String str) throws SearchEngineException { DOMResult result = new DOMResult(); try { XMLReader reader = new Parser(); …

java string tag-soup stringreader

asked Feb 21 '10 at 00:07

zajcev

votes

1 answer

XPath Expression returns nothing for //element, but //* returns a count

I'm using XOM with the following sample data: Element root = cleanDoc.getRootElement(); //find all the bold elements, as those mark institution and clinic. Nodes nodes = root.query("//*");

java xpath xml-namespaces xom tag-soup

asked Feb 24 '10 at 01:56

Stefan Kendall

66,414
68
253
406

votes

1 answer

Using a SAX parser when I need a DocumentBuilder

XMLBeam is a nice XML to POJO unmarshaler (via XPath), but it only allows you to configure a DocumentBuilder or DocumentBuilderFactory. TagSoup is a nice SAX parser that lets you parse nasty HTML documents as though they were XML. I would like to…

java xpath xml-parsing sax tag-soup

asked Mar 23 '14 at 19:20

Neil McGuigan

46,580
12
123
152

votes

2 answers

Extract URL from href-tag in groovy

I need to parse a malformed HTML-page and extract certain URLs from it as any kind of Collection. I don't really care what kind of Collection, I just need to be able to iterate over it. Let's say we have a structure like this: …

groovy xmlslurper tag-soup

asked Mar 17 '13 at 16:01

Jakunar

votes

1 answer

Wrap a tag around plain html text

I have this structure in my html document:

"You began the evening well, Charlotte," said Mrs. Bennet with civil self–command to Miss Lucas. "You were Mr. Bingley's first choice."

But i need my "plain…

java regex jsoup text-parsing tag-soup

asked Mar 22 '12 at 12:59

Richard

14,427
9
57
85

votes

1 answer

TagSoup and XPath

I'm trying to use TagSoup with XPath (JAXP). I know how to obtain SAX parser from TagSoup (or XMLReader). But I failed to find how to create DocumentBuilder that will use that SAX parser. How do I do that? Thank you. EDIT: Sorry for being so general…

java xpath tag-soup

asked Jul 21 '11 at 21:46

IgorY

votes

3 answers

Strange behavior with tagsoup and Groovy's XmlSlurper

Let's say I want to parse the phone number from an an xml string like this: str = """

123 New York, NY 10019

(212) 212-0001

…

xml parsing groovy tag-soup

asked Jan 27 '11 at 02:44

user308808

votes

1 answer

Point TagSoup Parser to use HTML5 version

I want TagSoup settings to use HTML5 standars. I am using tagsoup Parser which is adhearing to HTML4 which doesn't allow a

inside an tag. hence, parsing a wrong HTML. However, HTML5 allows the use of the same. How do I makethe tagsoup…

html tag-soup

asked Sep 03 '15 at 12:03

Anish Somani

votes

0 answers

parsing HTML5 with Enlive/Tagsoup/JSoup

HTML5 allows tags to appear in the body, but Enlive does not seem to support this: (deftest test-enlive (testing "enlive" (let [html-as-string "

the…

clojure enlive tag-soup

asked Feb 05 '15 at 03:08

George Armhold

30,824
50
153
232

votes

1 answer

How to use JAXB with HTML?

I would like to unmarshall some nasty HTML to a Java object using JAXB. (I'm on Java 7). Tagsoup is a SAX-compliant XML parser that can handle nasty HTML. How can I setup JAXB to use Tagsoup for unmarshalling HTML? I tried setting…

jaxb sax tag-soup

asked Jul 16 '14 at 21:51

Neil McGuigan

46,580
12
123
152

votes

1 answer

Parsing XML in Groovy with namespace and entities

Parsing XML in Groovy should be a piece of cake, but I always run into problems. I would like to parse a string like this:

This is a test with some formattings.
And this has a…

groovy html-parsing xmlslurper tag-soup

asked Aug 18 '13 at 08:53

rdmueller

10,742
10
69
126

votes

1 answer

Jtidy StringIndexOutOfBoundsException in Jmeter

I want to retrieve content from a webpage using JMeter. The data I'm looking for is inside a javascript block : (...)