I've been working on a VB.NET project to dynamically create report packs in PDF format using a SQL database and a number of input PDF templates. To cut a long story short, due to the way that Business Objects creates the input files it will be much more efficient to allow input of compiled PDF reports rather than individual report template pages. In order for this to work however, we would need to split the input PDF files into sections using the Bookmarks created by BOBJ. We are not sure how many pages will be in the range of each bookmark but require a consistent naming convention of the split files so that the next part of the process can pick the correct templates up and merge them in the required combinations.
The second part of this process is designed and working well using a .Net library called PDFSHARP. I have used the samples on their website to write some code which splits an input PDF file into one section per page of the input file, but do not understand how to split it using the bookmarks.
If I could understand how to parse the PDF and read in the meta data for the bookmarks which contain the start page and end page and the name of the bookmark then I think I could finish it.
An example of the input PDF format is here: https://drive.google.com/open?id=0B0GZGW6CFCI-UWY2WGRvV0dQSWZSNnNOWlp4R21zbFVPZDBn
There are 5 bookmarks (TID01, TID02 ...) and 6 pages. Section TID04 would have two pages output.
The file names I would need would be in the format of "ExamplePDF_TID01.pdf"
Any help to move forward would be greatly appreciated. - Looking on the wiki for the project it seems that it isn't very active any more and whilst other people have asked questions about this in the past there aren't any answers that I can find.
Code to Split by Page:
Sub Splitfiles()
Dim inputdir As String = "O:\Transformation\Standardisation\Input PDFs"
Dim outputdir As String = "O:\Transformation\Standardisation\Input PDFs\output\"
'inputdir = folder path containing input files
Dim fileEntries As String() = Directory.GetFiles(inputdir)
Dim filename As String
Dim pdfpage As PdfPage
Dim ccid As String
Dim pageid As Integer
Dim outputfilename As String
For Each filename In fileEntries
Dim importdoc As PdfDocument = PdfReader.Open(filename, PdfSharp.Pdf.IO.PdfDocumentOpenMode.Import)
Dim count As Integer = importdoc.PageCount
Dim x = 0
Do Until x = count
Dim outputdoc As PdfDocument = New PdfDocument
pdfpage = importdoc.Pages(x)
outputdoc.AddPage(pdfpage)
ccid = Strings.Right(filename, Len(filename) - Len(inputdir)) 'expand this to find CC ID
ccid = Strings.Left(ccid, Len(ccid) - 4)
pageid = x
outputfilename = outputdir & ccid & "_" & pageid & ".pdf"
outputdoc.Save(outputfilename)
x = x + 1
Loop
Next
End Sub
And the code I started to split by bookmark but couldn't finish:
Sub SplitPDFByBookmark()
Dim inputfile As String = "O:\Transformation\Standardisation\Input PDFs\Business Sub Area Report - Project Management - FY16_FP02 - 17062016_0709.PDF"
Dim outputdir As String = "O:\Transformation\Standardisation\Input PDFs\output\"
'inputdir = folder path containing input files
'Dim fileEntries As String() = Directory.GetFiles(inputdir)
Dim filename As String
Dim pdfpage As PdfPage
Dim ccid As String
Dim pageid As Integer
Dim outputfilename As String
filename = inputfile
'For Each filename In fileEntries
Dim importdoc As PdfDocument = PdfReader.Open(filename, PdfSharp.Pdf.IO.PdfDocumentOpenMode.Import)
Dim count As Integer = importdoc.PageCount
Dim x = 0
For Each bookmark In importdoc.Outlines
Dim outputdoc As PdfDocument = New PdfDocument
pdfpage = importdoc.Pages(importdoc.Outlines.)
outputdoc.AddPage(pdfpage)
pageid = x
outputfilename = outputdir & "OutputFile_" & pageid & ".pdf"
outputdoc.Save(outputfilename)
x = x + 1
Next
'Next
End Sub
Thanks in advance for your help!