0

HI All,

I have a PDF file with a xml attached, i need to parse the xml file. Does anyone knows how i do that? I´m using C#.

Thanks in advance.

Zorro
  • 13
  • 3

3 Answers3

0

Try using LINQ to XML as suggested in this question.

Community
  • 1
  • 1
Oren Hizkiya
  • 4,420
  • 2
  • 23
  • 33
0

PDF files can have a meta data information object or is it an XML file embedded as an object?

mark stephens
  • 3,205
  • 16
  • 19
0

I believe this blog post describing how read from a PDF file using C# is what you want.

This is the example he gives of grabbing text from the PDF:

using System;
using org.pdfbox.pdmodel;
using org.pdfbox.util;

namespace PDFReader
{
class Program
{
    static void Main(string[] args)
    {
        PDDocument doc = PDDocument.load("lopreacamasa.pdf");
        PDFTextStripper pdfStripper = new PDFTextStripper();
        Console.Write(pdfStripper.getText(doc));
    }
}
}

Here is what looks like an exhaustive and highly organized list of how to read PDFs with C#.

If what you need is some form of embedded meta data, as Mark suggested, I'm sure it's also possible with the to fetch using the tools I've linked to.

Oren Hizkiya
  • 4,420
  • 2
  • 23
  • 33