3

Could somebody please provide an example of parsing HTML into a list of elements using XMLWorkerHelper in iTextSharp (C#).

The JAVA version as given in the documentation is:

XMLWorkerHelper.getInstance().parseXHtml(new ElementHandler() {
        public void add(final Writable w) {

          if (w instanceof WritableElement) {
            List<Element> elements = ((WritableElement)w).elements();
          // write class names of elements to file
         }
        }

     }, HTMLParsingToList.class.getResourceAsStream("/html/walden.html"));
p.s.w.g
  • 146,324
  • 30
  • 291
  • 331
Joseph
  • 55
  • 1
  • 7

1 Answers1

4

You need to implement the IElementHandler interface in a class of your own:

public class SampleHandler : IElementHandler {
    //Generic list of elements
    public List<IElement> elements = new List<IElement>();
    //Add the supplied item to the list
    public void Add(IWritable w) {
        if (w is WritableElement) {
            elements.AddRange(((WritableElement)w).Elements());
        }
    }
}

Instead of using the file stream here's an example parsing a string. To use a file replace the StringReader with a StreamReader.

    string html = "<html><head><title>Test Document</title></head><body><p>This is a test. <strong>Bold <em>and italic</em></strong></p><ol><li>Dog</li><li>Cat</li></ol></body></html>";
    //Instantiate our handler
    var mh = new SampleHandler();
    //Bind a reader to our text
    using (TextReader sr = new StringReader(html)) {
        //Parse
        XMLWorkerHelper.GetInstance().ParseXHtml(mh, sr);
    }

    //Loop through each element
    foreach (var element in mh.elements) {
        //Loop through each chunk in each element
        foreach (var chunk in element.Chunks) {
            //Do something
        }
    }
Chris Haas
  • 53,986
  • 12
  • 141
  • 274
  • What is Samlpe handler? can you please explain? @Chris Haas – Jamshaid K. Mar 23 '17 at 13:03
  • 1
    `SampleHandler` is a custom class that implements iText's `IElementHandler` interface. This is completely custom code that you can do whatever you want with as long as you follow the interface's contract. – Chris Haas Mar 23 '17 at 13:25
  • Actually I am converting html to pdf, but i am unable to get the unicode chars when ParseToElementList is called. I was able to do it without any errors if I use ParseXHtml. but that way i am unable to add the results in my pdfpcells. can you guide me in the right way? @Chris Haas – Jamshaid K. Mar 23 '17 at 13:29