Swing HTMLDocument doesn't contain all the HTML elements?

Question

I have a swing application that contains a dialog displaying a HTML file. I do this like so:

URL url = getClass().getResource("/resources/EULA.html");

JDialog eulaDialog = new JDialog();
JEditorPane eulaEP = new JEditorPane();
try {
  eulaEP.setPage(url);
} catch (IOException e1) {
  // TODO Auto-generated catch block
  e1.printStackTrace();
}

JScrollPane scrollPane = new JScrollPane(eulaEP);
eulaDialog.getContentPane().add(scrollPane);
eulaDialog.setBounds(400, 200, 700, 600);
eulaDialog.setVisible(true);

This works as I expected. It displays the HTML file correctly.

I now try to find an element in the document so I do this:

HTMLDocument htm = (HTMLDocument) eulaEP.getDocument();
Element el = htm.getElement("unique_id");

But this returns null (even though the element is shown in the dialog).

I decided to check which elements were held in the document by doing this:

for (ElementIterator iterator = new ElementIterator(htm); iterator.next() != null;)

This however only returned 4 elements; html, body, p and content. My HTML file has a lot more (and what is content anyway?) What am I doing wrong?

Just to clarify, the HTML contains a button, I want to add an ActionListener to this button so I can catch a click on it in my java code.

*"the HTML contains a button, I want to add an ActionListener to this button"* Good luck with that! I'd put a button elsewhere on the GUI. For better help sooner, post a [MCVE] or [Short, Self Contained, Correct Example](http://www.sscce.org/). Hard code the HTML in a `String` in the source code. — Andrew Thompson, Jun 03 '16 at 13:05
`eulaDialog.setBounds(400, 200, 700, 600);` Java GUIs have to work on different OS', screen size, screen resolution etc. using different PLAFs in different locales. As such, they are not conducive to pixel perfect layout. Instead use layout managers, or [combinations of them](http://stackoverflow.com/a/5630271/418556) along with layout padding and borders for [white space](http://stackoverflow.com/a/17874718/418556). — Andrew Thompson, Jun 03 '16 at 13:09
I don't understand your last comment. I am creating a dialog, setBounds just sets the width, height and x,y position of the window on the screen doesn't it? What has that got to do with layout managers? — Lieuwe, Jun 03 '16 at 13:45
*"I don't understand your last comment."* When I wrote that, I thought is was a `JComponent` being positioned in a `JPanel`. Having said that, there are probably better ways to position and size it. Where is that MCVE? — Andrew Thompson, Jun 03 '16 at 14:08

score 1 · Answer 1 · answered Jun 03 '16 at 15:44

My guess is that you’re reading the document before it’s fully loaded. The doocumentation for JEditorPane.setPage is pretty informative on this:

This may load either synchronously or asynchronously depending upon the document returned by the EditorKit. … If the document is loaded asynchronously, the document will be installed into the editor immediately using a call to setDocument which will fire a document property change event, then a thread will be created which will begin doing the actual loading. In this case, the page property change event will not be fired by the call to this method directly, but rather will be fired when the thread doing the loading has finished. It will also be fired on the event-dispatch thread.

So you should not be looking at the document until it’s loaded. For example:

JEditorPane eulaEP = new JEditorPane();
eulaEP.addPropertyChangeListener("page", e -> {
    HTMLDocument htm = (HTMLDocument) eulaEP.getDocument();
    Element el = htm.getElement("unique_id");
    // ...
});

try {
  eulaEP.setPage(url);
} catch (IOException e1) {
  // TODO Auto-generated catch block
  e1.printStackTrace();
}

trashgod · Answer 2 · 2016-06-04T14:48:49.110

In the example below, HTMLDocument::getElement(String id) finds the Element whose HTML.Attribute.id attribute has the value "unique_id". The Element is BranchElement(div) 1,6

I'm not sure where your Element iteration goes awry, but you can see the unique_id value in the BranchElement(div) in the console output below. Because an HTMLDocument models HTML, the enclosed HTMLReader may synthesize HTML.Tag CONTENT, such as the content in the implied paragraphs seen below.

Console:

BranchElement(div) 1,6

Element: 'BranchElement(html) 0,6', name: 'html', children: 2, attributes: 1, leaf: false
  Attribute: 'name', Value: 'html'
Element: 'BranchElement(head) 0,1', name: 'head', children: 1, attributes: 1, leaf: false
  Attribute: 'name', Value: 'head'
Element: 'BranchElement(p-implied) 0,1', name: 'p-implied', children: 1, attributes: 1, leaf: false
  Attribute: 'name', Value: 'p-implied'
Element: 'LeafElement(content) 0,1', name: 'content', children: 0, attributes: 2, leaf: true
  Attribute: 'CR', Value: 'true'
  Attribute: 'name', Value: 'content'
    Content (0-1): ''
Element: 'BranchElement(body) 1,6', name: 'body', children: 1, attributes: 1, leaf: false
  Attribute: 'name', Value: 'body'
Element: 'BranchElement(div) 1,6', name: 'div', children: 1, attributes: 3, leaf: false
  Attribute: 'align', Value: 'center'
  Attribute: 'id', Value: 'unique_id'
  Attribute: 'name', Value: 'div'
Element: 'BranchElement(p-implied) 1,6', name: 'p-implied', children: 2, attributes: 1, leaf: false
  Attribute: 'name', Value: 'p-implied'
Element: 'LeafElement(content) 1,5', name: 'content', children: 0, attributes: 1, leaf: true
  Attribute: 'name', Value: 'content'
    Content (1-5): 'Test'
Element: 'LeafElement(content) 5,6', name: 'content', children: 0, attributes: 2, leaf: true
  Attribute: 'CR', Value: 'true'
  Attribute: 'name', Value: 'content'
    Content (5-6): ''

Code:

import java.awt.EventQueue;
import java.util.Enumeration;
import javax.swing.JEditorPane;
import javax.swing.JFrame;
import javax.swing.text.AttributeSet;
import javax.swing.text.BadLocationException;
import javax.swing.text.Element;
import javax.swing.text.ElementIterator;
import javax.swing.text.StyleConstants;
import javax.swing.text.html.HTML;
import javax.swing.text.html.HTMLDocument;

/**
 * @see http://stackoverflow.com/a/5614370/230513
 */
public class Test {

    private static final String TEXT
        = "<html>"
        + "<head></head>"
        + "<body>"
        + "<div align=center id=unique_id>Test</div>"
        + "</body>"
        + "</html>";

    public static void main(String[] args) throws Exception {
        EventQueue.invokeLater(new Test()::display);
    }

    private void display() {
        JFrame f = new JFrame("Test");
        f.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);
        JEditorPane jep = new JEditorPane("text/html", TEXT);
        jep.setEditable(false);
        f.add(jep);
        f.pack();
        f.setLocationRelativeTo(null);
        f.setVisible(true);

        HTMLDocument htmlDoc = (HTMLDocument) jep.getDocument();
        System.out.println(htmlDoc.getElement("unique_id"));
        ElementIterator iterator = new ElementIterator(htmlDoc);
        Element element;
        while ((element = iterator.next()) != null) {
            try {
                printElement(htmlDoc, element);
            } catch (BadLocationException e) {
                e.printStackTrace(System.err);
            }
        }
    }

    private void printElement(HTMLDocument htmlDoc, Element element) throws BadLocationException {
        AttributeSet attrSet = element.getAttributes();
        System.out.println(""
            + "Element: '" + element.toString().trim()
            + "', name: '" + element.getName()
            + "', children: " + element.getElementCount()
            + ", attributes: " + attrSet.getAttributeCount()
            + ", leaf: " + element.isLeaf());
        Enumeration attrNames = attrSet.getAttributeNames();
        while (attrNames.hasMoreElements()) {
            Object attr = attrNames.nextElement();
            System.out.println("  Attribute: '" + attr + "', Value: '"
                + attrSet.getAttribute(attr) + "'");
            Object tag = attrSet.getAttribute(StyleConstants.NameAttribute);
            if (attr == StyleConstants.NameAttribute
                && tag == HTML.Tag.CONTENT) {
                int startOffset = element.getStartOffset();
                int endOffset = element.getEndOffset();
                int length = endOffset - startOffset;
                System.out.printf("    Content (%d-%d): '%s'\n", startOffset,
                    endOffset, htmlDoc.getText(startOffset, length).trim());
            }
        }
    }
}

I've updated my answer based on a closer reading of your question. I thought you wanted the value of the attribute named `unique_id`; it looks like you want the [`id`](http://www.w3schools.com/tags/att_global_id.asp) having the value `"unique_id"`. — trashgod, Jun 04 '16 at 14:58

Swing HTMLDocument doesn't contain all the HTML elements?

2 Answers2

Linked