I have a sql server db. In there are many, many rows. Each row has a column that contains a stored pdf.
The db is a gig in size. So we can expect roughly half that size is due to the pdfs.
now I have a requirement to join all those pdf's ... into 1 pdf. Don't ask why.
Can you suggest the best way forward and which component will be best suited for this job. There are many answers available:
How can I join two PDF's using iTextSharp?
Merge memorystreams to one itext document
How to merge multiple pdf files (generated in run time)?
as to how to join two (or more pdfs). But what I'm asking for is in terms of performance. We literally dealing with around 50 000 pdfs that need to be merged into 1 almighty pdf
[Edit Solution] Brought time to merge 1000 pdfs from 4m30s to 21s
public void MergePDFs(string targetPDF, string sourceDir)
{
using (FileStream stream = new FileStream(targetPDF, FileMode.Create))
{
var files = Directory.GetFiles(sourceDir);
Document pdfDoc = new Document(PageSize.A4);
PdfCopy pdf = new PdfCopy(pdfDoc, stream);
pdfDoc.Open();
Console.WriteLine("Merging files count: " + files.Length);
int i = 1;
var watch = System.Diagnostics.Stopwatch.StartNew();
foreach (string file in files)
{
Console.WriteLine(i + ". Adding: " + file);
pdf.AddDocument(new PdfReader(file));
i++;
}
if (pdfDoc != null)
pdfDoc.Close();
watch.Stop();
var elapsedMs = watch.ElapsedMilliseconds;
MessageBox.Show(elapsedMs.ToString());
}
}