3

I have a bunch of pdf files in a folder and would like to know the best way to either via a free PDF counter software or programmatically how to count the number of pages for each pdf and put the result in either a excel or access table. I already have the table populated with the pdf filenames. I googled "PDF page counter" and there were a number of hits, however I'm not sure how trust worthy these tools are. So, what some names of trust worthy pdf page counting tools/software and alternatively, are there any good VB.NET code samples that attempt this?

Thank you!

artwork21
  • 330
  • 12
  • 29
  • Possible duplicate of [Determine number of pages in a PDF file](http://stackoverflow.com/questions/320281/determine-number-of-pages-in-a-pdf-file) – Amedee Van Gasse Feb 24 '16 at 08:20
  • Questions asking us to recommend or find a book, tool, software library, tutorial or other off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it. – David van Driessche Feb 24 '16 at 12:45

4 Answers4

6

I would recommend the iText pdf library. http://www.itextpdf.com/ It's a java library, but it has also been ported to C # if you are more comfortable with that.

Once you've got that library imported; the java code to get the number of pages from a pdf is:

PdfReader pr = new PdfReader("/path/to/yourFile.pdf");
return pr.getNumberOfPages();
Reece
  • 71
  • 1
4

I had the same problem in the past. I've used pdftk tool inside powershell

dir c:\ *.pdf | foreach-object {

    $pdf = pdftk.exe $_.FullName dump_data
    $NumberOfPages = [regex]::match($pdf,'NumberOfPages: (\d+)').Groups[1].Value

    New-Object PSObject -Property @{
        Name = $_.Name
        FullName = $_.FullName
        NumberOfPages = $NumberOfPages
    }
} | select name,fullname,numberofpages | export-csv -notypeinformation d:\list.txt 

After some test I realized that I had problems when I had protected pdfs. Using itextsharp I solved them

[void][System.Reflection.Assembly]::LoadFrom("c:\itextsharp\itextsharp.dll")
gci -path c:\ *.pdf | foreach-object{

    $itext = new-object itextsharp.text.pdf.PdfReader($_.fullname)
    if (-not $itext.IsEncrypted() ) {
    $pdf = pdftk.exe $_.FullName dump_data
    $NumberOfPages = [regex]::match($pdf,'NumberOfPages: (\d+)').Groups[1].Value

    New-Object PSObject -Property @{
        Name = $_.Name
        FullName = $_.FullName
        NumberOfPages = $NumberOfPages
        }
    }

    else {
     New-Object PSObject -Property @{
        Name = $_.Name
        FullName = $_.FullName
        NumberOfPages = "encrypted"
        }

    }

} |Select-Object name,fullname,numberofpages | export-csv -notypeinformation d:\list2.txt 

Hope that it helps.

edit. Please note that great part of the script has been done by Shay Levy, a powershell guru :)

Nicola Cossu
  • 54,599
  • 15
  • 92
  • 98
1

Following Nick's solution, you can avoid pdftk altogether using just itextsharp.

Why would you want that? Well it turns out that pdftk can't read (returning a java.NulPointerException) some pdf files that itextsharp can. In fact i managed to create a function using pdftk and regular expressions but i had to switch to itextsharp due to this exceptions.

The function is the following (and pretty straightforward to follow):

function Count-PdfPages{
Param([System.IO.FileSystemInfo]$file)
# loads itextsharp
[void][System.Reflection.Assembly]::LoadFrom("C:\Users\me\Desktop\itextsharp-all-5.3.4\itextsharp.dll")

$itext = new-object itextsharp.text.pdf.PdfReader($file.fullname)

if (-not $itext.IsEncrypted() ) {
    $NumberOfPages = $itext.NumberOfPages
    return $numberOfPages
}

else{
    return "The file $($file.fullname) is encrypted"
}

}
# Example
Set-Location 'C:\Users\me\Desktop\Nueva carpeta'

Get-ChildItem | Where-object{$_.extension -eq '.pdf'} | ForEach-Object{Count-PdfPages $_}
mechantid
  • 46
  • 2
  • 9
0

One Line:

Dim pdfPageCount As Integer = System.IO.File.ReadAllText("example.pdf").Split(New String() {"/Type /Page"}, StringSplitOptions.None).Count() - 2

Recommended: iTextSharp

imports iTextSharp.text.pdf

Dim pdfPath As String = "test.pdf"
Dim pdfReader As New PdfReader(pdfPath)
Dim numberOfPages As Integer = pdfReader.NumberOfPages
Medo Medo
  • 952
  • 2
  • 12
  • 21
  • 2
    I edited your answer because your link points to SourceForge, which is now obsolete. All iText projects, including iTextSharp, moved to GitHub a year ago. – Amedee Van Gasse Feb 24 '16 at 08:22
  • 1
    The "One Line" solution only works for some documents. The names **Type** and **Page** may have any number and kind of white spaces between them, starting at zero. There also may be comments inbetween. Then there may also be unused page objects. Then there are alternative ways to write the names. Furthermore, page objects might be stored away in object streams. Etc. etc. etc.... – mkl Feb 24 '16 at 09:10
  • 1
    ya. you are right . I prefer to use 3rd party library .like itextsharp – Medo Medo Feb 24 '16 at 09:28