I need to read the pdf file and need to convert to HTML. Currently I'm using iTextsharp to read PDF. Is there any dll with proper documentation to read pdf files.
Thanks
I need to read the pdf file and need to convert to HTML. Currently I'm using iTextsharp to read PDF. Is there any dll with proper documentation to read pdf files.
Thanks
ITextSharp is pretty decent and quite easy to implement.. Here is a small example of reading a pdf and putting the text into a string which is then printed out to a label on a webforms page:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Web;
using System.Web.UI;
using System.Web.UI.WebControls;
using iTextSharp.text.pdf;
using iTextSharp.text.pdf.parser;
namespace pdfreadertest
{
public partial class _Default : System.Web.UI.Page
{
protected void Page_Load(object sender, EventArgs e)
{
GetTextFromPDFFile(@"c:\example.pdf", 1);
}
public void GetTextFromPDFFile(string pdfFile, int pageNumber)
{
// Call the reader to read the pdf file
PdfReader pdfReader = new PdfReader(pdfFile);
// Extract the text from the pdf reader and put into a string
string pdfText = PdfTextExtractor.GetTextFromPage(pdfReader, pageNumber);
// Try and close the reader
try
{
pdfReader.Close();
}
catch{ }
// Put the string (pdf text) into a label to display on page
this.lblPdfText.Text = pdfText;
}
}
}
Hope that helps.
I think iTextSharp is one of the most popular even though there are several other libs like iText.Net, PDF Sharp, Sharp PDF etc Google it, you will find a lot of them. I have used iTextSharp and i like it.