0

I am writing a small library intended to be a high level (as in simple to use) library for digitally signing pdf's generated with the WeasyPrint library (https://github.com/Kozea/WeasyPrint).

I have already got it working for self-signed certificates and now I'm working on an adapter for digital signatures from the Globalsign DSS API (https://www.globalsign.com/en/resources/apis/api-documentation/digital-signing-service-api-documentation.html)

I've got everything working apart from LTV (Long Term Validation) which requires a DSS dictionary listing OCSP info and any certificates in the chain (To deal with revocation).

When I add the DSS, which has to come after the signature data has been written, I get an error in Adobe Acrobat stating that the signature byterange is invalid.

How do I go about enabling the DSS feature without invalidating the byterange?

I've studied the iText library somewhat intensively but it's so abstracted that it's hard to make out the actual data being written. I've still taken the liberty of tagging iText because it is somewhat of an industry standard in dealing with Digital Signatures in PDF's.

hejsan
  • 371
  • 2
  • 7
  • Unfortunately you don't describe at all how you add the DSS to your pdf. To start with, therefore: do you add it in an incremental update or not? – mkl Apr 06 '20 at 08:05
  • @mkl I'm actually interested in knowing which method I'm doing. I'm doing it all in the same pass that adds the signature but the method feels like an incremental update since I have to make another byterange for the document timestamp and write another trailer. I'm also not quite sure if the document timestamp is needed to make the PDF show up as LTV enabled if I had done this in some other way. – hejsan Apr 10 '20 at 00:47
  • 1
    It depends on the exact LTV profile you want to create. To get something Adobe Reader considers "LTV-enabled", you don't need a time stamp. If you want a PAdES Baseline LTA signature, you normally need two time stamps. As mentioned before, your description of what you do and what you want is very vague. – mkl Apr 10 '20 at 07:23
  • 1
    In your answer you reference code. I'll try and look at it later. – mkl Apr 10 '20 at 07:29
  • @mkl >If you want a PAdES Baseline LTA signature, you normally need two time stamps. That's what I ended up doing, one is embedded in the signature dictionary and the other is a document level timestamp that comes after the DSS has been added. The Document level one doesn't show up in Adobe Reader but it seems to work to make it not freak out about the signature byterange being invalid because of the DSS. – hejsan Apr 10 '20 at 10:26
  • I uploaded a pdf if it helps: https://drive.google.com/file/d/1JFoR7cWekr-GsdIfKg_6xqy6vcdyxARa/view?usp=sharing – hejsan Apr 10 '20 at 10:35
  • @mkl >I'll try and look at it later The bulk of the code is in the file globalsign.py and helpers.py I have yet to structure it properly as I have been working pretty much nonstop day and night to enable Uni staff to work from home due to the Corona virus. You feedback is _very_ welcome :) – hejsan Apr 10 '20 at 11:28
  • 1
    Chances are that it'll take until after the Easter time, though. – mkl Apr 10 '20 at 13:12
  • 1
    I had a quick look at your file. One obvious error: The **Contents** of the document time stamp dictionary are prepared for a hex encoded string (which is normal) but you put the time stamp into there without hex encoding! This cannot be parsed, so at best your document time stamp is ignored and at worst PDF processors fail to read your PDF as a whole. This explains why *"the Document level one doesn't show up in Adobe Reader"*. – mkl Apr 10 '20 at 16:30
  • @mkl Thank you for taking a look. I tried hex-encoding it but it still doesn't show up, I uploaded it here - if you have the time I'm very grateful: https://drive.google.com/file/d/1UHSMPfZck3xgSNSe-RCQlIM6pAhEUMcI/view?usp=sharing – hejsan Apr 11 '20 at 22:29
  • 1
    The document time stamp needs to be the value of a signature field which in turn should be referenced, directly or indirectly, from the AcroForm dictionary in the document Catalog. Your document time stamp is completely unconnected. – mkl Apr 12 '20 at 19:43
  • Ah, I was wondering how it would get picked up being unreferenced like this. I thought maybe it should be referenced directly in the catalog somehow. Could I let the signature field that references the Digital Signature have an indirect reference to an array including both the Digital Signature and the Document Timestamp or do I need a separate one from the one referencing the Digital Signature? Is it customary to let timestamps have appearances or are they usually just a Rect[0 0 0 0]? Again thank you so much - people with your expertise are scarce in the wild. – hejsan Apr 13 '20 at 10:55
  • Oh never mind - that previous signature of course has to be written before the timestamp can make it's byterange. So a document that has both a Digital Signature and a Document Timestamp has to have two signature fields. – hejsan Apr 13 '20 at 11:04
  • 1
    You need a separate signature form field for each pdf signature or document time stamp. Whether or not you have visible appearances for your document time stamps, depends on your use case. If the use case does not require a visualization in the document, it usually is much easier not to have one. Thus, you'll very often find document time stamps without visualization. Be aware, though, if you process pdf/a documents and want them to remain pdf/a, you need an appearance stream (which may be empty) even for invisible time stamps. – mkl Apr 13 '20 at 11:10
  • Ok - now I'm afraid I might be opening another can of worms - Is my document a PDF/A document, and if not, does it need to be? – hejsan Apr 13 '20 at 11:19
  • I'm guessing - looking at the standard - that since it doesn't reference any outside files and embeds all the DSS related stuff it does indeed conform to PDF/A – hejsan Apr 13 '20 at 11:23
  • 1
    PDF/A is a family of pdf profiles for archiving, mostly concerned with accessibility of content. If pdf/a was a concern to you, you should know beforehand. Considering your comments you don't, so I don't assume it is a concern. – mkl Apr 13 '20 at 11:33
  • Alright, I've now confirmed that WeasyPrint does _not_ make PDF/A compliant pdf's. So my question is: Does it need to be pdf/a to be LTV in the view of ETSI standards? – hejsan Apr 13 '20 at 11:40
  • 1
    It does not. This would only have been a concern if you had to be pdf/a aware to start with because that would mean additional requirements on signatures in general. – mkl Apr 13 '20 at 12:14

1 Answers1

0

I figured this out thanks to this beautifully named document: Electronic Signatures and Infrastructures (ESI); PDF Advanced Electronic Signature Profiles; Part 4: PAdES Long Term - PAdES-LTV Profile The title may be a verbose mess but the document is actually a very concise and helpful read.

A DSS can be added after the original byterange by also adding a timestamp that takes a digest of another byterange that includes the entire file - including the DSS - at the end of the file. You have to enable an extension for this to work, see "Chapter 4.4 Extensions Dictionary". There are more detailed specifics in the linked document.

I think it's worth mentioning that I found out about some syntax errors in my PDF by using the Apache PDFBox utilities. I wish I had found it sooner.

If you're interested I published the library on github: https://github.com/hejsan/WeasySign. It already works but need some touching up.

hejsan
  • 371
  • 2
  • 7
  • The ETSI TS you reference was the original description of PAdES LTV mechanisms. There meanwhile are newer and more finalized ones, both as TS and as EN. – mkl Apr 10 '20 at 07:26
  • @mkl This was the only one I found to have actual examples of pdf markup. Could you link the newer one? I'm searching their website and there's a very long list of possibilities: https://www.etsi.org/standards#page=1&search=pdf%20pades%20long%20term&title=1&etsiNumber=1&content=1&version=0&onApproval=1&published=1&historical=1&startDate=1988-01-15&endDate=2020-04-10&harmonized=0&keyword=&TB=&stdType=&frequency=&mandate=&collection=&sort=1 – hejsan Apr 10 '20 at 10:41
  • 1
    The PAdES specifications now can be found in ETSI EN 319 142-1 and ETSI EN 319 142-2. There may be fewer *examples of pdf markup* in it but the nowadays more relevant baseline profiles are explained here, too. – mkl Apr 10 '20 at 16:45