2

Using PdfSharp.NET, I would like to load an existing PDF file and change all of the elements with a certain color to different color.

Intuition tells me that it would require looping through each element in a PDF document and then changing the color attribute, but I'm not able to find where to loop through all the elements, much less the color attribute on them.

Is this something that's even possible with PDF sharp and how would I do this if it is possible?

Bigbob556677
  • 1,805
  • 1
  • 13
  • 38

2 Answers2

1

If you consider the usage of alternative libraries, then look at Docotic.Pdf library. Disclaimer: I am the author.

You can check and change colors like that:

  1. Copy page objects based on CopyPageObjects sample
  2. Modify setBrush and setPen methods like that:
    if (color != null)
        dst.Color = getReplacement(color);
    
    ...
    // implement getReplacement method based on your requrements
    private static PdfColor getReplacement(PdfColor color)
    {
        // replace pure red RGB colors with green
        if (color is PdfRgbColor rgb)
        {
            if (rgb.R == 255 && rgb.G == 0 && rgb.B == 0)
                return new PdfRgbColor(0, 255, 0);
        }
    
        return color;
    }
    
  3. If you also need to change colors in image objects then you need to save, change and replace image before the target.DrawImage(image.Image, 0, 0, 0); line. Like that:
    string fileName = image.Image.Save(..);
    
    // change colors in the "fileName" image.
    // For example: https://stackoverflow.com/questions/17208254/how-to-change-pixel-color-of-an-image-in-c-net
    string replacementImage = changeImageColors(fileName); 
    
    image.Image.ReplaceWith(replacementImage);
    

Here is the full sample code for replacing (255, 0, 0) colors with (0, 255, 0) in vector paths and text objects:

using System.Diagnostics;

namespace BitMiracle.Docotic.Pdf.Samples
{
    public static class CopyPageObjects
    {
        public static void Main()
        {
            // NOTE: 
            // When used in trial mode, the library imposes some restrictions.
            // Please visit http://bitmiracle.com/pdf-library/trial-restrictions.aspx
            // for more information.

            const string PathToFile = "CopyPageObjects.pdf";

            using (var pdf = new PdfDocument(@"your_document.pdf"))
            {
                using (PdfDocument copy = pdf.CopyPages(0, 1))
                {
                    PdfPage sourcePage = copy.Pages[0];
                    PdfPage copyPage = copy.AddPage();

                    copyPage.Rotation = sourcePage.Rotation;
                    copyPage.MediaBox = sourcePage.MediaBox;
                    if (sourcePage.CropBox != sourcePage.MediaBox)
                        copyPage.CropBox = sourcePage.CropBox;

                    PdfCanvas target = copyPage.Canvas;
                    foreach (PdfPageObject obj in sourcePage.GetObjects())
                    {
                        target.SaveState();
                        setClipRegion(target, obj.ClipRegion);

                        if (obj.Type == PdfPageObjectType.Path)
                        {
                            PdfPath path = (PdfPath)obj;
                            target.Transform(path.TransformationMatrix);

                            if (path.PaintMode == PdfDrawMode.Fill || path.PaintMode == PdfDrawMode.FillAndStroke)
                                setBrush(target.Brush, path.Brush);

                            if (path.PaintMode == PdfDrawMode.Stroke || path.PaintMode == PdfDrawMode.FillAndStroke)
                                setPen(target.Pen, path.Pen);

                            appendPath(target, path);
                            drawPath(target, path);
                        }
                        else if (obj.Type == PdfPageObjectType.Image)
                        {
                            PdfPaintedImage image = (PdfPaintedImage)obj;
                            target.TranslateTransform(image.Position.X, image.Position.Y);
                            target.Transform(image.TransformationMatrix);

                            setBrush(target.Brush, image.Brush);
                            target.DrawImage(image.Image, 0, 0, 0);
                        }
                        else if (obj.Type == PdfPageObjectType.Text)
                        {
                            PdfTextData text = (PdfTextData)obj;
                            drawText(target, text);
                        }

                        target.RestoreState();
                    }

                    copy.RemovePage(0);

                    copy.Save(PathToFile);
                }
            }

            Process.Start(PathToFile);
        }

        private static void setClipRegion(PdfCanvas canvas, PdfClipRegion clipRegion)
        {
            if (clipRegion.IntersectedPaths.Count == 0)
                return;

            PdfMatrix transformationBefore = canvas.TransformationMatrix;
            try
            {
                foreach (PdfPath clipPath in clipRegion.IntersectedPaths)
                {
                    canvas.ResetTransform();
                    canvas.Transform(clipPath.TransformationMatrix);
                    appendPath(canvas, clipPath);
                    canvas.SetClip(clipPath.ClipMode.Value);
                }
            }
            finally
            {
                canvas.ResetTransform();
                canvas.Transform(transformationBefore);
            }
        }

        private static void setBrush(PdfBrush dst, PdfBrushInfo src)
        {
            PdfColor color = src.Color;
            if (color != null)
                dst.Color = getReplacement(color);

            dst.Opacity = src.Opacity;

            var pattern = src.Pattern;
            if (pattern != null)
                dst.Pattern = pattern;
        }

        private static void setPen(PdfPen dst, PdfPenInfo src)
        {
            PdfColor color = src.Color;
            if (color != null)
                dst.Color = getReplacement(color);

            var pattern = src.Pattern;
            if (pattern != null)
                dst.Pattern = pattern;

            dst.DashPattern = src.DashPattern;
            dst.EndCap = src.EndCap;
            dst.LineJoin = src.LineJoin;
            dst.MiterLimit = src.MiterLimit;
            dst.Opacity = src.Opacity;
            dst.Width = src.Width;
        }

        private static PdfColor getReplacement(PdfColor color)
        {
            // replace pure red RGB colors with green
            if (color is PdfRgbColor rgb)
            {
                if (rgb.R == 255 && rgb.G == 0 && rgb.B == 0)
                    return new PdfRgbColor(0, 255, 0);
            }

            return color;
        }

        private static void appendPath(PdfCanvas target, PdfPath path)
        {
            foreach (PdfSubpath subpath in path.Subpaths)
            {
                foreach (PdfPathSegment segment in subpath.Segments)
                {
                    switch (segment.Type)
                    {
                        case PdfPathSegmentType.Point:
                            target.CurrentPosition = ((PdfPointSegment)segment).Value;
                            break;

                        case PdfPathSegmentType.Line:
                            PdfLineSegment line = (PdfLineSegment)segment;
                            target.CurrentPosition = line.Start;
                            target.AppendLineTo(line.End);
                            break;

                        case PdfPathSegmentType.Bezier:
                            PdfBezierSegment bezier = (PdfBezierSegment)segment;
                            target.CurrentPosition = bezier.Start;
                            target.AppendCurveTo(bezier.FirstControl, bezier.SecondControl, bezier.End);
                            break;

                        case PdfPathSegmentType.Rectangle:
                            target.AppendRectangle(((PdfRectangleSegment)segment).Bounds);
                            break;

                        case PdfPathSegmentType.CloseSubpath:
                            target.ClosePath();
                            break;
                    }
                }
            }
        }

        private static void drawPath(PdfCanvas target, PdfPath path)
        {
            switch (path.PaintMode)
            {
                case PdfDrawMode.Fill:
                    target.FillPath(path.FillMode.Value);
                    break;

                case PdfDrawMode.FillAndStroke:
                    target.FillAndStrokePath(path.FillMode.Value);
                    break;

                case PdfDrawMode.Stroke:
                    target.StrokePath();
                    break;

                default:
                    target.ResetPath();
                    break;
            }
        }

        private static void drawText(PdfCanvas target, PdfTextData td)
        {
            target.TextRenderingMode = td.RenderingMode;
            setBrush(target.Brush, td.Brush);
            setPen(target.Pen, td.Pen);

            target.TextPosition = PdfPoint.Empty;
            target.FontSize = td.FontSize;
            target.Font = td.Font;
            target.TranslateTransform(td.Position.X, td.Position.Y);
            target.Transform(td.TransformationMatrix);

            target.DrawString(td.GetCharacterCodes());
        }
    }
}
Vitaliy Shibaev
  • 1,420
  • 10
  • 24
0

You're going to get into a world of hurt if you're trying to do this with PDF Sharp. Have a look at this thread to know why:

Alter PDF - Text repositioning

PDF Sharp allows you to get to the building blocks of the PDF (what Adobe in their libraries calls the COS layer), but it doesn't build a graphics representation of the objects on the page.

So you would need to get to the text stream that contains all PDF graphic elements for a page, interpret this text into actual object definitions, figure out which objects you want to change and where the colouring instructions for those objects are and change them if necessary. This is far from trivial.

To give you an idea of what you would be working with, you would have to interpret something like this:

q
0 g
0 G    
0 0 200 100 re
1 0 0 0 k
(Hi!) T*
Q   

Things would actually be slightly more complex than simply reading these types of text strings for each page, as the page can (and often does) contain "forms" which you would then have to locate in the PDF and go through the same steps with.

Don't want to discourage you, but this really is a quite complex task with a library that has no support for graphic element parsing.

David van Driessche
  • 6,602
  • 2
  • 28
  • 41
  • Does anything in that format above specify the color of the object? I was almost thinking I could just grep through the file looking for a certain *indicator* of the color attribute and then just changing it in place. I could be way off base with that though. – Bigbob556677 Jun 23 '20 at 20:12
  • Yes, it does, and no you couldn't :) PDF files aren't plain text, they allow compression of page contents and other elements (ZIP compression for page content for example). On top of that, you can't change the length of things because that will break the cross reference table at the end (typically) of the PDF file that says where all the bits are. – David van Driessche Jun 23 '20 at 20:15
  • I see. Well maybe not grep. Haha but from within pdf sharp I may could modify that elements data to a different color? Just shoot in the dark basically. My use case if very simple. I just need to change ALL black elements to white (make them invisible) I even considered the possibility of just removing all elements they are black, but at the time that seemed harder than this. Thank you! – Bigbob556677 Jun 23 '20 at 20:19
  • It really depends how basic your documents are. If the documents are always created by the same application and you can establish what they use in terms of PDF constructs and what they don't use, you might have a chance. In the general case? Forget it: I can list you probably 10 different ways black could be expressed for an object in a PDF file. As an example "0 g", "0 G", "0 0 0 1 K" and "0 0 0 rg" all specify a black color, and those are the easy cases :) – David van Driessche Jun 23 '20 at 20:25
  • I see. Well thank you so much for your effort. I may have to go another direction with it. +1 – Bigbob556677 Jun 23 '20 at 20:27