1

Given an existing PDF file with pages that are in portrait orientation, how can I process the file programmatically (with .NET) to generate a new file with the same content on pages with a landscape orientation.

The new pages should take full advantage of the available landscape width. The number of pages might increase because an existing portrait page might not fit into a single landscape page.

Backstory: we use Google Sheets REST API to generate pdf documents. If there are a lot of columns, the document can be very wide. Unfortunately the Google Drive REST API always exports in portrait mode and doesn't offer an option to change to landscape.

Here is an example PDF file that we need to process: https://drive.google.com/file/d/1dVf1GD7zmDx9wJhseGEbfPCVYTJbN-uG/view?usp=sharing

Clement
  • 3,990
  • 4
  • 43
  • 44
  • 1
    This is a much more difficult question than you seem to imagine. To take an existing (portrait) PDF, switch it to landscape and somehow have it re-organise itself to take up all the landscape room is non-trivial. PDF does not really have the concept of 'page overflow' in quite the same way as in word. This question is really also quite broad - you need to look into some of the available PDF libraries for C#. – Paddy Jun 23 '20 at 09:36
  • If you shared an example document and that document was representative enough, there might be a chance. And given your Backstory, your documents quite likely are similar enough for such representative examples... – mkl Jun 23 '20 at 14:20
  • @mkl I added a link to an example document. Your right, documents are very similar and follow the same tabular/spreadsheet structure. – Clement Jun 24 '20 at 09:10

2 Answers2

1

You can do that using Docotic.Pdf library. The simplest solution is to convert every source page to XObject, then scale it to landscape and draw on multiple target pages.

Here is the sample:

using System.Linq;
using BitMiracle.Docotic.Pdf;

namespace SplitToMultiplePages
{
    public static class SplitToMultiplePages
    {
        public static void Main()
        {
            // NOTE: 
            // When used in trial mode, the library imposes some restrictions.
            // Please visit http://bitmiracle.com/pdf-library/trial-restrictions.aspx
            // for more information.
            BitMiracle.Docotic.LicenseManager.AddLicenseData("temporary or permanent license key here");

            using (var src = new PdfDocument(@"Example.pdf"))
            {
                // Calculate common parameters based on the first page.
                // That makes sense when all pages have the same size, portrait orientation, and margins.
                PdfPage srcPage = src.Pages[0];
                PdfCollection<PdfTextData> words = srcPage.GetWords();

                double topMargin = words[0].Position.Y;
                double bottomMargin = srcPage.Height - words[words.Count - 1].Bounds.Bottom;
                double scale = srcPage.Height / srcPage.Width;
                const int BorderHeight = 1;

                // This sample shows how to convert existing PDF content in portrait orientation to landscape
                Debug.Assert(scale > 1);

                using (var dest = new PdfDocument())
                {
                    bool addDestPage = false;
                    double destPageY = topMargin;
                    for (int s = 0; s < src.PageCount; ++s)
                    {
                        if (s > 0)
                        {
                            srcPage = src.Pages[s];
                            words = srcPage.GetWords();
                        }

                        // skip empty pages
                        if (words.Count == 0)
                            continue;

                        // Get content of the source page, scale to landscape and draw on multiple pages
                        double textStartY = words[0].Bounds.Top;
                        double[] lineBottomPositions = words
                            .Select(w => (w.Bounds.Bottom - textStartY + BorderHeight) * scale)
                            .Distinct()
                            .ToArray();
                        double contentHeight = lineBottomPositions[lineBottomPositions.Length - 1];

                        PdfXObject xobj = dest.CreateXObject(srcPage);

                        double remainingHeight = contentHeight;
                        while (true)
                        {
                            PdfPage destPage = addDestPage ? dest.AddPage() : dest.Pages[dest.PageCount - 1];
                            destPage.Width = srcPage.Height;
                            destPage.Height = srcPage.Width;
                            double availableHeight = destPage.Height - destPageY - bottomMargin;
                            if (remainingHeight > availableHeight)
                                availableHeight = adjustToNearestLine(availableHeight, lineBottomPositions);

                            PdfCanvas destCanvas = destPage.Canvas;
                            destCanvas.SaveState();

                            destCanvas.TranslateTransform(0, destPageY);
                            destCanvas.AppendRectangle(new PdfRectangle(0, 0, destPage.Width, availableHeight), 0);
                            destCanvas.SetClip(PdfFillMode.Winding);

                            double y = -topMargin * scale - (contentHeight - remainingHeight);
                            destCanvas.DrawXObject(xobj, 0, y, xobj.Width * scale, xobj.Height * scale, 0);

                            destCanvas.RestoreState();

                            if (remainingHeight <= availableHeight)
                            {
                                // Move to next source page
                                addDestPage = false;
                                destPageY = remainingHeight + bottomMargin;
                                break;
                            }

                            // Need more pages in the resulting document
                            remainingHeight -= availableHeight;
                            addDestPage = true;
                            destPageY = topMargin;
                        }
                    }

                    // Optionally you can use Single Column layout by default
                    //dest.PageLayout = PdfPageLayout.OneColumn;

                    dest.Save("SplitToMultiplePages.pdf");
                }
            }
        }

        private static double adjustToNearestLine(double height, double[] lineHeights)
        {
            // TODO: Use binary search for better performance

            for (int i = lineHeights.Length - 1; i >= 0; --i)
            {
                double lh = lineHeights[i];
                if (height > lh)
                    return lh;
            }

            return lineHeights[0];
        }
    }
}

The sample produces the following result: https://drive.google.com/file/d/1ITtV3Uw84wKd9mouV4kBpPoeWtsHlB9A/view?usp=sharing

Screenshot: Screenshot of the resulting document

Based on your requirements you can also skip headers on all pages except the first one. Here is the sample for this case:

using System.Linq;
using BitMiracle.Docotic.Pdf;

namespace SplitToMultiplePages
{
    public static class SplitToMultiplePages
    {
        public static void Main()
        {
            // NOTE: 
            // When used in trial mode, the library imposes some restrictions.
            // Please visit http://bitmiracle.com/pdf-library/trial-restrictions.aspx
            // for more information.
            BitMiracle.Docotic.LicenseManager.AddLicenseData("temporary or permanent license key here");

            using (var src = new PdfDocument(@"Example.pdf"))
            {
                // Calculate common parameters based on the first page.
                // That makes sense when all pages have the same size, portrait orientation, and margins.
                PdfPage srcPage = src.Pages[0];
                PdfCollection<PdfTextData> words = srcPage.GetWords();

                double topMargin = words[0].Position.Y;
                double bottomMargin = srcPage.Height - words[words.Count - 1].Bounds.Bottom;
                double scale = srcPage.Height / srcPage.Width;
                const int BorderHeight = 1;

                // This sample shows how to convert existing PDF content in portrait orientation to landscape
                Debug.Assert(scale > 1);

                using (var dest = new PdfDocument())
                {
                    bool addDestPage = false;
                    double destPageY = topMargin;
                    for (int s = 0; s < src.PageCount; ++s)
                    {
                        if (s > 0)
                        {
                            srcPage = src.Pages[s];
                            words = srcPage.GetWords();
                        }

                        // skip empty pages
                        if (words.Count == 0)
                            continue;

                        // Get content of the source page, scale to landscape and draw on multiple pages
                        double textStartY = words[0].Bounds.Top;
                        
                        // Skip the header line of all pages except first
                        if (s > 0)
                        {
                            double? firstDataRowY = words.Select(w => w.Bounds.Top).FirstOrDefault(y => y > textStartY);
                            if (!firstDataRowY.HasValue)
                                continue;

                            textStartY = firstDataRowY.Value;
                        }

                        double[] lineBottomPositions = words
                            .Select(w => (w.Bounds.Bottom - textStartY + BorderHeight) * scale)
                            .Distinct()
                            .ToArray();
                        double contentHeight = lineBottomPositions[lineBottomPositions.Length - 1];

                        PdfXObject xobj = dest.CreateXObject(srcPage);

                        double remainingHeight = contentHeight;
                        while (true)
                        {
                            PdfPage destPage = addDestPage ? dest.AddPage() : dest.Pages[dest.PageCount - 1];
                            destPage.Width = srcPage.Height;
                            destPage.Height = srcPage.Width;
                            double availableHeight = destPage.Height - destPageY - bottomMargin;
                            if (remainingHeight > availableHeight)
                                availableHeight = adjustToNearestLine(availableHeight, lineBottomPositions);

                            PdfCanvas destCanvas = destPage.Canvas;
                            destCanvas.SaveState();

                            destCanvas.TranslateTransform(0, destPageY);
                            destCanvas.AppendRectangle(new PdfRectangle(0, 0, destPage.Width, availableHeight), 0);
                            destCanvas.SetClip(PdfFillMode.Winding);

                            double y = -textStartY * scale - (contentHeight - remainingHeight);
                            destCanvas.DrawXObject(xobj, 0, y, xobj.Width * scale, xobj.Height * scale, 0);

                            destCanvas.RestoreState();

                            if (remainingHeight <= availableHeight)
                            {
                                // Move to the next source page
                                addDestPage = false;
                                destPageY = remainingHeight + bottomMargin;
                                break;
                            }

                            // Need more pages in the resulting document
                            remainingHeight -= availableHeight;
                            addDestPage = true;
                            destPageY = topMargin;
                        }
                    }

                    // Optionally you can use Single Column layout by default
                    //dest.PageLayout = PdfPageLayout.OneColumn;

                    dest.Save("SplitToMultiplePages.pdf");
                }
            }
        }

        private static double adjustToNearestLine(double height, double[] lineHeights)
        {
            // TODO: Use binary search for better performance

            for (int i = lineHeights.Length - 1; i >= 0; --i)
            {
                double lh = lineHeights[i];
                if (height > lh)
                    return lh;
            }

            return lineHeights[0];
        }
    }
}

The resulting file for the "skip headers" sample: https://drive.google.com/file/d/1v9lPYIposkNNgheUzz8kD3XSwMxGBJIz/view?usp=sharing

Vitaliy Shibaev
  • 1,420
  • 10
  • 24
  • I tried running this sample and got an ArgumentOutOfRangeException on double contentHeight = words[words.Count - 1].Bounds.Bottom - words[0].Bounds.Top; – Clement Jun 30 '20 at 11:56
  • In addition, checking the result file that you shared, lines and words at the bottom of the page get cut off. – Clement Jun 30 '20 at 11:57
  • I've updated the sample to fix execution in the trial mode. It's worth to try the sample with the temporary key from [here](https://bitmiracle.com/pdf-library/download-pdf-library.aspx) – Vitaliy Shibaev Jun 30 '20 at 13:01
  • I also updated the sample to do not truncate rows between pages. The resulting document on Google Drive was updated too. You can fully control the math to split content as required. – Vitaliy Shibaev Jun 30 '20 at 13:47
  • Thanks, I marked as an answer but there is still a problem. The header row that is on top of each page in the original document ends up in the middle of the re-arranged pages. Do you know how to fix that? – Clement Jul 01 '20 at 07:27
  • I have extended the answer - added a sample that skips all headers except the first. Look at the second part of the answer. – Vitaliy Shibaev Jul 01 '20 at 13:26
0

Using iTextSharp (iText for .Net v5.5.13) and the PdfVeryDenseMergeTool and PageVerticalAnalyzer classes from this question (in "UPDATE 2" and "UPDATE 3" where the OP posted his port of the Java solution from the accepted answer to C#) like this:

List<PdfReader> Files = new List<PdfReader>();
Files.Add(new PdfReader(@"Example.pdf"));

PdfVeryDenseMergeTool tool = new PdfVeryDenseMergeTool(new RectangleReadOnly(595, 420), 18, 18, 10);

using (MemoryStream ms = new MemoryStream())
{
    tool.Merge(ms, Files);
    byte[] bytes = ms.ToArray();
    // bytes contains the result
} 

I get a five page landscape result PDF looking like this:

screen shot page thumbnails

mkl
  • 90,588
  • 15
  • 125
  • 265
  • iTextSharp is marked as deprecated and PdfVeryDenseMergeTool and PageVerticalAnalyzer don't even compile with iText (missing types, etc...) – Clement Jun 30 '20 at 11:50
  • *"iTextSharp is marked as deprecated"* - as you didn't mention library restrictions in your question, that didn't appear to matter. But there is an equivalent class for itext 7. *"don't even compile with iText"* - they do compile. With itext 5.5.x. – mkl Jun 30 '20 at 15:38