0

I am currently working in vb.net. My company is going paperless and I want to do a cost saving analysis on paper savings. Currently we save all of our PDF files onto a server. The file path is like this "Server>Folder1>Folder2>Folder3>Folder4>PDF files." Folders 1 and 2 are always used to navigate through. Folder 3 is a list of departments, and folder 4 is each job. Each folder 4 has multiple pdf files. To be put simply the names of Folder 1 and Folder 2 are static while folders 3 and 4 are dynamic. To make things even harder all of the PDF files located after folder 4 have different names. I have the bit of code below to detect how many pages a pdf is without having to open it but it requires the file pathway. Considering there are hundreds if not over a thousand pdf files I want to programmatically loop through all of these files, detect if the file is a pdf file, then sum all of the pages that are found. I can then use that number to calculate cost savings of going paperless.

 PdfReader pr = new PdfReader("/path/to/yourFile.pdf");
 return pr.getNumberOfPages();

Another idea would be to somehow merge all the files togther into a single PDF file which would make it as simple as opening the file to see how many pages are there.

Cheddar
  • 530
  • 4
  • 30
  • Does it have to be in vb.net? This task sounds like it would be much simpler in a shell scripting language. – Eli Sadoff Oct 19 '16 at 18:19
  • @EliSadoff it may very well be easier in another language but I am only familiar with C# and VB.NET. If the code isn't hard I may be able to figure it out. – Cheddar Oct 19 '16 at 18:20
  • Seems like a recursive sub that checks each directory's files, and then sees if it has any sub-directories would work great. If sun-directories found, then have it call itself again and performs the same checks on each sub directory...etc – soohoonigan Oct 19 '16 at 18:21
  • @soohoonigan do you know of any example code on the web that would help point me in the right direction? – Cheddar Oct 19 '16 at 18:22
  • [This Answer](http://stackoverflow.com/a/929277/6664878) is in C# and has the recursive structure there. This answer just prints the filenames out, but it's a good example of the logic you need. You'd basically just have to swap your pdf logic in where the filenames are being printed – soohoonigan Oct 19 '16 at 18:25
  • @soohoonigan Let me do some reading on that link and I will get back to this post. – Cheddar Oct 19 '16 at 18:34

1 Answers1

0

Here is a VBA solution. Run the code in Excel.

Sub PDFandNumPages()

   Dim Folder As Object
   Dim file As Object
   Dim fso As Object
   Dim iExtLen As Integer, iRow As Integer
   Dim sFolder As String, sExt As String
   Dim sPDFName As String

   sExt = "pdf"
   iExtLen = Len(sExt)
   iRow = 1
   ' Must have a '\' at the end of path
   sFolder = "C:\your_path_here\"

   Set fso = CreateObject("Scripting.FileSystemObject")

   If sFolder <> "" Then
      Set Folder = fso.GetFolder(sFolder)
      For Each file In Folder.Files
         If Right(file, iExtLen) = sExt Then
            Cells(iRow, 1).Value = file.Name
            Cells(iRow, 2).Value = pageCount(sFolder & file.Name)
            iRow = iRow + 1
         End If
      Next file
   End If

End Sub

Function pageCount(sFilePathName As String) As Integer

Dim nFileNum As Integer
Dim sInput As String
Dim sNumPages As String
Dim iPosN1 As Integer, iPosN2 As Integer
Dim iPosCount1 As Integer, iPosCount2 As Integer
Dim iEndsearch As Integer

' Get an available file number from the system
nFileNum = FreeFile

'OPEN the PDF file in Binary mode
Open sFilePathName For Binary Lock Read Write As #nFileNum

  ' Get the data from the file
  Do Until EOF(nFileNum)
      Input #1, sInput
      sInput = UCase(sInput)
      iPosN1 = InStr(1, sInput, "/N ") + 3
      iPosN2 = InStr(iPosN1, sInput, "/")
      iPosCount1 = InStr(1, sInput, "/COUNT ") + 7
      iPosCount2 = InStr(iPosCount1, sInput, "/")

   If iPosN1 > 3 Then
      sNumPages = Mid(sInput, iPosN1, iPosN2 - iPosN1)
      Exit Do
   ElseIf iPosCount1 > 7 Then
      sNumPages = Mid(sInput, iPosCount1, iPosCount2 - iPosCount1)
      Exit Do
   ' Prevent overflow and assigns 0 to number of pages if strings are not in binary
   ElseIf iEndsearch > 1001 Then
      sNumPages = "0"
      Exit Do
   End If
      iEndsearch = iEndsearch + 1
   Loop

  ' Close pdf file
  Close #nFileNum
  pageCount = CInt(sNumPages)

End Function

Here is an alternative way of doing essentially the same thing.

Sub Test()
    Dim MyPath As String, MyFile As String
    Dim i As Long
    MyPath = "C:\your_path_here\"
    MyFile = Dir(MyPath & Application.PathSeparator & "*.pdf", vbDirectory)
    Range("A:B").ClearContents
    Range("A1") = "File Name": Range("B1") = "Pages"
    Range("A1:B1").Font.Bold = True
    i = 1
    Do While MyFile <> ""
        i = i + 1
        Cells(i, 1) = MyFile
        Cells(i, 2) = GetPageNum(MyPath & Application.PathSeparator & MyFile)
        MyFile = Dir
    Loop
    Columns("A:B").AutoFit
    MsgBox "Total of " & i - 1 & " PDF files have been found" & vbCrLf _
           & " File names and corresponding count of pages have been written on " _
           & ActiveSheet.Name, vbInformation, "Report..."
End Sub
'
Function GetPageNum(PDF_File As String)
    'Haluk 19/10/2008
    Dim FileNum As Long
    Dim strRetVal As String
    Dim RegExp
    Set RegExp = CreateObject("VBscript.RegExp")
    RegExp.Global = True
    RegExp.Pattern = "/Type\s*/Page[^s]"
    FileNum = FreeFile
    Open PDF_File For Binary As #FileNum
        strRetVal = Space(LOF(FileNum))
        Get #FileNum, , strRetVal
    Close #FileNum
    GetPageNum = RegExp.Execute(strRetVal).Count
End Function
ASH
  • 20,759
  • 19
  • 87
  • 200