0

I need to read the pdf file and need to convert to HTML. Currently I'm using iTextsharp to read PDF. Is there any dll with proper documentation to read pdf files.

Thanks

Sam
  • 37
  • 1
  • 2
  • 8
  • Check this http://stackoverflow.com/questions/2295555/how-to-convert-pdf-into-html-using-c-sharp – Matt Jul 13 '12 at 10:53

2 Answers2

0

ITextSharp is pretty decent and quite easy to implement.. Here is a small example of reading a pdf and putting the text into a string which is then printed out to a label on a webforms page:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Web;
using System.Web.UI;
using System.Web.UI.WebControls;
using iTextSharp.text.pdf;
using iTextSharp.text.pdf.parser;

namespace pdfreadertest
{
    public partial class _Default : System.Web.UI.Page
    {
        protected void Page_Load(object sender, EventArgs e)
        {
            GetTextFromPDFFile(@"c:\example.pdf", 1);
        }

        public void GetTextFromPDFFile(string pdfFile, int pageNumber)
        {
            // Call the reader to read the pdf file
            PdfReader pdfReader = new PdfReader(pdfFile);

            // Extract the text from the pdf reader and put into a string
            string pdfText = PdfTextExtractor.GetTextFromPage(pdfReader, pageNumber);

            // Try and close the reader
            try
            {
                pdfReader.Close();
            }
            catch{ }

            // Put the string (pdf text) into a label to display on page
            this.lblPdfText.Text = pdfText;
        }
    }
}

Hope that helps.

Danny Brady
  • 1,895
  • 4
  • 20
  • 30
-1

I think iTextSharp is one of the most popular even though there are several other libs like iText.Net, PDF Sharp, Sharp PDF etc Google it, you will find a lot of them. I have used iTextSharp and i like it.

Moble Joseph
  • 647
  • 4
  • 14
  • The OP said he is using iTextsharp already, so could you please elaborate what your answer is about? – yms Jul 24 '12 at 14:34