How to extract title from .pdf file using c#

Question

I know that for Python such solution exist already (pypdf). But I hope that someone could suggest some library for C# for this issue.

For "title" I mean what you can see in window title when you open some .pdf file. — apros, Nov 15 '10 at 16:48

Darin Dimitrov · Accepted Answer · 2010-11-15T16:47:52.857

3

A commonly used library for manipulating PDF files in .NET is iTextSharp which is a port of the iText library. Here's an example:

class Program
{
    static void Main()
    {
        PdfReader reader = new PdfReader("test.pdf");
        var title = reader.Info["Title"];
        Console.WriteLine(title);
    }
}

edited Nov 15 '10 at 16:47

answered Nov 15 '10 at 16:38

Darin Dimitrov

1,023,142
271
3,287
2,928

Bobrovsky · Answer 2 · 2020-08-07T12:48:13.230

2

Docotic.Pdf library (Disclaimer: I work for the company) may be used to accomplish the task.

Please take a look at my answer for similar question.

Beyond that the library can do many other things of course.

edited Aug 07 '20 at 12:48

answered May 31 '11 at 17:54

Bobrovsky

13,789
19
80
130

score 1 · Answer 3 · answered Nov 15 '10 at 16:39

1

How about this:

http://glenswords.wordpress.com/2007/07/16/extract-the-title-of-a-pdf-using-c/

answered Nov 15 '10 at 16:39

Randy Minder

47,200
49
204
358

+1. You might want to add something between the `<<` and the `/Title` since stuff like `/CreationDate` might show up first. This is definitely cheating and is a dirty rotten hack (and using the solution as written is probably a bad idea), but it has the advantage over the other solutions of not requiring a giant library for a rather tiny feature. – Brian Nov 15 '10 at 16:48
I completely agree with Brian as a light solution for tiny feature – apros Nov 15 '10 at 17:14

score 0 · Answer 4 · answered Nov 15 '10 at 16:45

0

One alternative to iTextSharp is PDFBOX. See CodeProject Tutorial for instructions on using it. This is slightly ugly since you're basically running a C# Java VM, but it's actually really easy to use.

answered Nov 15 '10 at 16:45

Brian

25,523
18
82
173

score 0 · Answer 5 · answered Nov 15 '10 at 16:53

If by "Title" you mean the Title keyword in the metadata in the Trailer of the PDF, then you can use a number of different tools. iTextSharp will do it, although I don't know the API well enough to give you code.

If you use dotImage, from Atalasoft (where I work, and incidentally, I wrote this code), you can do this:

PdfDocumentMetadata metadata  = PdfDocumentMetadata.FromStream(sourceStream);
Console.WriteLine("Title is \"{0}\"", metadata.Title);

This class also gives you Author, Subject, Keywords, Creator, Producer, CreationDate, ModificationDate, Trapped, and custom fields.

If you're talking about finding the title in XMP embedded in the PDF - well, that's a different beast entirely and I don't yet have support for pulling that out.

Thank you very much for your posting. Your solution seems to be the most attractive for my issue from commercial library, on my point of view. — apros, Nov 15 '10 at 17:18

How to extract title from .pdf file using c#

5 Answers5