I want to convert PDF file into .XLS format in ASP.NET using C#. Is it possible to do like this?
-
2It is possible for sure. Read the pdf file with one of pdf reading libraries for c# (PdfSharp) and then create a xls file with the data you read with a library for that (used http://epplus.codeplex.com, recommend it). – okisinch Mar 23 '15 at 10:03
3 Answers
It's not clear to me what you're exactly trying to achieve, but if I were you, I would split the problem in two:
- How can I read content from a PDF file? You can find some insights here.
- How can I create and write to an xls file in C#? There is already a great answer here.
If you ask more specific questions, you'lle be able to get better answers.
-
I want to convert PDF file to .xls file. In short the data in PDf file in tabular form same data I want to open into excel file. – Shree Mar 24 '15 at 12:23
You can use PdfSharp library to serve this purpose.
using System;
using System.Diagnostics;
using System.IO;
using PdfSharp;
using PdfSharp.Drawing;
using PdfSharp.Pdf;
using PdfSharp.Pdf.IO;
using PdfSharp.Pdf.Advanced;
namespace WorkOnPdfObjects
{
class Program
{
static void Main()
{
const string filename = "Portable Document Format.pdf";
File.Copy(Path.Combine("../../../../../PDFs/", filename),
Path.Combine(Directory.GetCurrentDirectory(), filename), true);
PdfDocument document = PdfReader.Open(filename);
PdfDictionary dict = new PdfDictionary(document);
dict.Elements["/S"] = new PdfName("/GoTo");
PdfArray array = new PdfArray(document);
dict.Elements["/D"] = array;
PdfReference iref = PdfInternals.GetReference(document.Pages[2]);
array.Elements.Add(iref);
array.Elements.Add(new PdfName("/FitV"));
array.Elements.Add(new PdfInteger(-32768));
document.Internals.AddObject(dict);
document.Internals.Catalog.Elements["/OpenAction"] =
PdfInternals.GetReference(dict);
document.Save(filename);
Process.Start(filename);
}
}
}
I think this should help you.

- 381
- 8
- 19
-
on server side button click event I want to do this. Is there any other option – Shree Mar 23 '15 at 10:10
-
yeah convert this console script to web service and invoke it @Shree – Karthikeyan Mar 23 '15 at 10:14
The solution actually depends on the complexity of PDF document(s) you have. The problem is that some PDF files can be converted with easy by simply writing each text objects one after another following their order inside PDF but this order is not guaranteed to be similar to the visual appearance due to the PDF format design.
There are some options:
Use iTextSharp (open source) to read PDF and then process each text object from PDF and create CSV output based on it (write each text objects enclosed with quote, separate them by comma, and separate rows by the line break symbol) like in this sample code.
You may also use the powerful port of Apache PDFBox (Java) to [PDFBox.NET] instead 3 (but it requires IKVM, Java VM implemented in .NET)
In case of complex PDF documents you may use a specialized commercial solution like ByteScout PDF Extractor SDK designed to extract tables as CSV or XLS from PDF.
// disclosure: I work for ByteScout

- 2,820
- 19
- 24