0

I have a large collection of PDFs with vector graphics (mostly lines and curves) that need to be batch edited in some way to modify their basic properties like corner types and end types. This could also apply for editing thicknesses and colors.

I am already using iTextSharp to edit these PDFs to insert an image into the background of every file, but I don't have much documentation on curves and lines and I cannot find a way to edit the lines themselves. I'm open to other libraries as well, but I haven't found one that clearly addresses how to edit existing curves and lines, only draw new ones.

using iTextSharp.text;
using iTextSharp.text.pdf;

// open the reader
PdfReader reader = new PdfReader(refPath);
Rectangle size = reader.GetPageSizeWithRotation(1);
Document document = new Document(size);

// open the writer
FileStream fs = new FileStream(path, FileMode.Create, FileAccess.Write);
PdfWriter writer = PdfWriter.GetInstance(document, fs);
document.Open();

// the pdf content
PdfContentByte cb = writer.DirectContent;

//get an image to be inserted.
var screenshot = System.Drawing.Image.FromFile("somefile.png");

//Create iTextSharp image
Image bg = Image.GetInstance(screenshot, System.Drawing.Imaging.ImageFormat.Png);
bg.SetDpi(dpi, dpi);
bg.ScaleToFit(size);
bg.SetAbsolutePosition(0, 0);
bg.Alignment = Image.UNDERLYING;

cb.AddImage(bg);

/**
Get and edit linework properties in here somewhere???
**/

// create the new page and add it to the pdf
PdfImportedPage page = writer.GetImportedPage(reader, 1);
cb.AddTemplate(page, 0, 0);

// close the streams
document.Close();
fs.Close();
writer.Close();
reader.Close();

Ideally the output of all of the lines would look something like this:

Any ideas are appreciated!

mkl
  • 90,588
  • 15
  • 125
  • 265
sc_o
  • 9
  • 2
  • Can you share an example PDF containing a representative set of such *Before* style curves? I have an idea how to fairly easily implement that using a `PdfContentStreamEditor` (as presented in [this answer](https://stackoverflow.com/a/35915789/1729265)). – mkl Feb 18 '19 at 11:17
  • Did the answers help? Or are there still open questions? – mkl Mar 04 '19 at 15:02

2 Answers2

1

It is tempting to try and round trip this kind of thing via a format one is comfortable with. So perhaps a conversion to SVG, then manipulation, then back to PDF.

However I would encourage you to stay away from this type of temptation because such a round-trip will inevitably lead to distortion and loss.

Instead I would encourage you to work directly with the raw PDF operator stream. It looks a bit daunting at first but actually it's pretty simple once you get the hang of it. For example (percentages indicate comments),

q  % save state
0 0 10 10 re % define rectangle path
s % stroke
Q % restore state

The Adobe PDF Specification will give you all the detail. It's big but it's well written and clear. See Annex A for a list of all the operators and links to the relevant sections.

https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf

So then the issue becomes how do you work with your existing content stream?

Parsing these things is non-trivial so I would suggest you work with a tool. For example ABCpdf will allow you to break the stream up into atoms, modify the sequence and then slot them back into the original document. For a code example see,

https://www.websupergoo.com/helppdfnet/default.htm?page=source%2F7-abcpdf.atoms%2Fopatom%2F1-methods%2Ffind.htm

This is a pretty elegant and powerful mechanism in terms of parsing and manipulation. I'm sure there are other tools that allow similar things but I know about ABCpdf. :-)

0

Your image shows that you want to edit line caps and line joins in all paths to be round.

Unfortunately you did not share representative example files, so I had to construct one myself with a mix of different cap and join styles and a path form reminding of yours:

Paths.pdf

I'd propose for your task to make use of the generic PdfContentStreamEditor from this answer as it does all the heavy lifting and we can concentrate on the task at hand.

Thus, what does our stream editor implementation have to do? It has to set cap and join styles to "round" and prevent these settings from being overridden. Looking into the PDF specification we see that cap and join styles are parameters of the current graphics state and can either be set directly using the J and j instructions respectively or via the LC and LJ entries in a Graphics State Parameter Dictionary.

Thus, we can implement our stream editor simply by first initializing cap and join style to round and then drop all J and j instructions and re-initialize cap and join styles after each graphics state gs instruction.

class PathMakeCapAndJoinRound : PdfContentStreamEditor
{
    protected override void Write(PdfContentStreamProcessor processor, PdfLiteral operatorLit, List<PdfObject> operands)
    {
        if (start)
        {
            initializeCapAndJoin(processor);
            start = false;
        }
        if (CAP_AND_JOIN_OPERATORS.Contains(operatorLit.ToString()))
        {
            return;
        }
        base.Write(processor, operatorLit, operands);
        if (GSTATE_OPERATOR == operatorLit.ToString())
        {
            initializeCapAndJoin(processor);
        }
    }

    void initializeCapAndJoin(PdfContentStreamProcessor processor)
    {
        PdfLiteral operatorLit = new PdfLiteral("J");
        List<PdfObject> operands = new List<PdfObject> { new PdfNumber(PdfContentByte.LINE_CAP_ROUND), operatorLit };
        base.Write(processor, operatorLit, operands);

        operatorLit = new PdfLiteral("j");
        operands = new List<PdfObject> { new PdfNumber(PdfContentByte.LINE_JOIN_ROUND), operatorLit };
        base.Write(processor, operatorLit, operands);
    }

    List<string> CAP_AND_JOIN_OPERATORS = new List<string> { "j", "J" };
    string GSTATE_OPERATOR = "gs";
    bool start = true;
}

Applying it like this to the PDF above

using (PdfReader pdfReader = new PdfReader(testDocument))
using (PdfStamper pdfStamper = new PdfStamper(pdfReader, new FileStream(@"Paths-Rounded.pdf", FileMode.Create, FileAccess.Write), (char)0, true))
{
    pdfStamper.RotateContents = false;
    PdfContentStreamEditor editor = new PathMakeCapAndJoinRound();

    for (int i = 1; i <= pdfReader.NumberOfPages; i++)
    {
        editor.EditPage(pdfStamper, i);
    }
}

we get the result:

Paths-Rounded.pdf


Beware, the restrictions from the referenced answer remain. In particular this editor only edits the page content stream. For a complete solution you have to also edit all form XObject and Pattern streams, and also deal with annotations.


To allow reproduction, this is how I created my test document:

byte[] createMixedPathsPdf()
{
    using (MemoryStream memoryStream = new MemoryStream())
    {
        using (Document document = new Document())
        {
            PdfWriter writer = PdfWriter.GetInstance(document, memoryStream);
            document.Open();
            var canvas = writer.DirectContent;
            canvas.SetLineWidth(10);

            canvas.MoveTo(100, 700);
            canvas.CurveTo(180, 720, 180, 720, 200, 800);
            canvas.CurveTo(220, 720, 220, 720, 350, 700);
            canvas.MoveTo(350, 700);
            canvas.CurveTo(220, 680, 220, 680, 210, 650);
            canvas.Stroke();

            canvas.SetLineCap(PdfContentByte.LINE_CAP_BUTT);
            canvas.SetLineJoin(PdfContentByte.LINE_JOIN_BEVEL);
            canvas.SetGState(createGState(PdfContentByte.LINE_CAP_BUTT, PdfContentByte.LINE_JOIN_BEVEL));
            canvas.MoveTo(100, 500);
            canvas.CurveTo(180, 520, 180, 520, 200, 600);
            canvas.CurveTo(220, 520, 220, 520, 350, 500);
            canvas.MoveTo(350, 500);
            canvas.CurveTo(220, 480, 220, 480, 210, 450);
            canvas.Stroke();

            canvas.SetLineCap(PdfContentByte.LINE_CAP_PROJECTING_SQUARE);
            canvas.SetLineJoin(PdfContentByte.LINE_JOIN_MITER);
            canvas.SetGState(createGState(PdfContentByte.LINE_CAP_PROJECTING_SQUARE, PdfContentByte.LINE_JOIN_MITER));
            canvas.MoveTo(100, 300);
            canvas.CurveTo(180, 320, 180, 320, 200, 400);
            canvas.CurveTo(220, 320, 220, 320, 350, 300);
            canvas.MoveTo(350, 300);
            canvas.CurveTo(220, 280, 220, 280, 210, 250);
            canvas.Stroke();

            canvas.SetLineCap(PdfContentByte.LINE_CAP_ROUND);
            canvas.SetLineJoin(PdfContentByte.LINE_JOIN_ROUND);
            canvas.SetGState(createGState(PdfContentByte.LINE_CAP_ROUND, PdfContentByte.LINE_JOIN_ROUND));
            canvas.MoveTo(100, 100);
            canvas.CurveTo(180, 120, 180, 120, 200, 200);
            canvas.CurveTo(220, 120, 220, 120, 350, 100);
            canvas.MoveTo(350, 100);
            canvas.CurveTo(220, 080, 220, 080, 210, 050);
            canvas.Stroke();
        }
        return memoryStream.ToArray();
    }
}

PdfGState createGState(int lineCap, int lineJoin)
{
    PdfGState pdfGState = new PdfGState();
    pdfGState.Put(new PdfName("LC"), new PdfNumber(lineCap));
    pdfGState.Put(new PdfName("LJ"), new PdfNumber(lineJoin));
    return pdfGState;
}
mkl
  • 90,588
  • 15
  • 125
  • 265