2

After reading this Stack Overflow question (and other pages, referenced below, in the comments) I came up with a PHP code that, given a digitally signed PDF file, informs who signed it:

<?php
function der2pem($der_data) {

    // https://www.php.net/manual/en/ref.openssl.php

    $pem = chunk_split(base64_encode($der_data), 64, "\n");
    $pem = "-----BEGIN CERTIFICATE-----\n".$pem."-----END CERTIFICATE-----\n";
    return $pem;
}

function extract_pkcs7_signatures($path_to_pdf) {

    // https://stackoverflow.com/q/46430367

    $content = file_get_contents($path_to_pdf);

    $regexp = '/ByteRange\ \[\s*(\d+) (\d+) (\d+)/';

    $result = [];
    preg_match_all($regexp, $content, $result);

    $signatures = null;

    if (isset($result[2]) && isset($result[3]) && isset($result[2][0]) && isset($result[3][0])) {
        $start = $result[2][0];
        $end = $result[3][0];
        if ($stream = fopen($path_to_pdf, 'rb')) {
            $signatures = stream_get_contents($stream, $end - $start - 2, $start + 1);
            fclose($stream);
            $signatures = hex2bin($signatures);
        }
    }

    return $signatures;
}

function who_signed($path_to_pdf) {

    // https://www.php.net/manual/en/openssl.certparams.php
    // https://www.php.net/manual/en/function.openssl-pkcs7-read.php
    // https://www.php.net/manual/en/function.openssl-x509-parse.php

    $signers = [];

    $signatures = extract_pkcs7_signatures($path_to_pdf);
    if (!empty($signatures)) {
        $pem = der2pem($signatures);
        $certificates = array();
        $result = openssl_pkcs7_read($pem, $certificates);
        if ($result) {
            foreach ($certificates as $certificate) {
                $certificate_data = openssl_x509_parse($certificate);
                $signers[] = $certificate_data['subject']['CN'];
            }
        }
    }

    return $signers;
}

$path_to_pdf = 'test.pdf';

// In case you want to test the extract_pkcs7_signatures() function:

/*
$signatures = extract_pkcs7_signatures($path_to_pdf);
$path_to_pkcs7 = pathinfo($path_to_pdf, PATHINFO_FILENAME) . '.pkcs7';
file_put_contents($path_to_pkcs7, $signatures);
echo shell_exec("openssl pkcs7 -inform DER -in $path_to_pkcs7 -print_certs -text");
exit;
*/

var_dump(who_signed($path_to_pdf));
?>

This is just command line PHP, you don't need to run any previous Composer commands to be able to run this script.

For some test1.pdf, signed by just one person (let's call her ALICE), this script returns:

array(4) {
  [0]=>
  string(23) "CERTIFICATE AUTHORITY 1"
  [1]=>
  string(23) "CERTIFICATE AUTHORITY 2"
  [2]=>
  string(5) "ALICE"
  [3]=>
  string(5) "ALICE"
}

For some test2.pdf, signed by two people (let's call them BOB and CAROL), this script returns:

array(4) {
  [0]=>
  string(23) "CERTIFICATE AUTHORITY 1"
  [1]=>
  string(3) "BOB"
  [2]=>
  string(23) "CERTIFICATE AUTHORITY 2"
  [3]=>
  string(23) "CERTIFICATE AUTHORITY 3"
}

The problem with this script is that, comparing its outputs to the ones provided by pdfsig, they are wrong.

For the same test1.pdf, pdfsig returns:

Digital Signature Info of: test1.pdf
Signature #1:
  - Signer Certificate Common Name: ALICE
...

For the same test2.pdf, pdfsig returns:

Digital Signature Info of: test2.pdf
Signature #1:
  - Signer Certificate Common Name: BOB
...
Signature #2:
  - Signer Certificate Common Name: CAROL
...

What am I doing wrong? I mean, what do I need to do to correctly identify the person (or the people) who signed a PDF file?

Antônio Medeiros
  • 3,068
  • 1
  • 27
  • 22

1 Answers1

1

My previous script didn't consider the following:

  • a PDF file may have one or more signatures (PKCS#7 files), each one indicated by a ByteRange array (I found this reading the Digital Signatures in a PDF spec, the solution proposed by @Denis Alimov read the first ByteRange only)
  • a PKCS#7 file may contain many certificates, including certificate authorities certificates and people certificates (we are interested in people certificates only)
  • a PKCS#7 file may contain duplicate certificates (if you know why, please tell me, this is just what I found in the sample PDF's I have)

Here is my current working script, which returns outputs aligned with pdfsig:

<?php
function der2pem($der_data) {

    // https://www.php.net/manual/en/ref.openssl.php

    $pem = chunk_split(base64_encode($der_data), 64, "\n");
    $pem = "-----BEGIN CERTIFICATE-----\n".$pem."-----END CERTIFICATE-----\n";
    return $pem;
}

function extract_pkcs7_signatures($path_to_pdf) {

    // https://stackoverflow.com/q/46430367

    $pdf_contents = file_get_contents($path_to_pdf);

    $regexp = '/ByteRange\ \[\s*(\d+) (\d+) (\d+)/';

    $result = [];
    preg_match_all($regexp, $pdf_contents, $result);

    $signatures = [];

    if (isset($result[0])) {
        $signature_count = count($result[0]);
        for ($s = 0; $s < $signature_count; $s++) {
            $start = $result[2][$s];
            $end = $result[3][$s];
            $signature = null;
            if ($stream = fopen($path_to_pdf, 'rb')) {
                $signature = stream_get_contents($stream, $end - $start - 2, $start + 1);
                fclose($stream);
                $signature = hex2bin($signature);
                $signatures[] = $signature;
            }
        }
    }

    return $signatures;
}

function who_signed($path_to_pdf) {

    // https://www.php.net/manual/en/openssl.certparams.php
    // https://www.php.net/manual/en/function.openssl-pkcs7-read.php
    // https://www.php.net/manual/en/function.openssl-x509-parse.php

    $signers = [];

    $pkcs7_der_signatures = extract_pkcs7_signatures($path_to_pdf);
    if (!empty($pkcs7_der_signatures)) {
        $parsed_certificates = [];
        foreach ($pkcs7_der_signatures as $pkcs7_der_signature) {
            $pkcs7_pem_signature = der2pem($pkcs7_der_signature);
            $pem_certificates = [];
            $result = openssl_pkcs7_read($pkcs7_pem_signature, $pem_certificates);
            if ($result) {
                foreach ($pem_certificates as $pem_certificate) {
                    $parsed_certificate = openssl_x509_parse($pem_certificate);
                    $parsed_certificates[] = $parsed_certificate;
                }
            }
        }

        // Remove certificate authorities certificates

        $people_certificates = [];
        foreach ($parsed_certificates as $certificate_a) {
            $is_authority = false;
            foreach ($parsed_certificates as $certificate_b) {
                if ($certificate_a['subject'] == $certificate_b['issuer']) {
                    // If certificate A is of the issuer of certificate B, then
                    // certificate A belongs to a certificate authority and,
                    // therefore, should be ignored
                    $is_authority = true;
                    break;
                }
            }
            if (!$is_authority) {
                $people_certificates[] = $certificate_a;
            }
        }

        // Remove duplicate certificates

        $distinct_certificates = [];
        foreach ($people_certificates as $certificate_a) {
            $is_duplicated = false;
            if (count($distinct_certificates) > 0) {
                foreach ($distinct_certificates as $certificate_b) {
                    if (
                        ($certificate_a['subject'] == $certificate_b['subject']) &&
                        ($certificate_a['serialNumber'] == $certificate_b['serialNumber']) &&
                        ($certificate_a['issuer'] == $certificate_b['issuer'])
                    ) {
                        // If certificate B has the same subject, serial number
                        // and issuer as certificate A, then certificate B is a
                        // duplicate and, therefore, should be ignored
                        $is_duplicated = true;
                        break;
                    }
                }
            }
            if (!$is_duplicated) {
                $distinct_certificates[] = $certificate_a;
            }
        }

        foreach ($distinct_certificates as $certificate) {
            $signers[] = $certificate['subject']['CN'];
        }
    }

    return $signers;
}

$path_to_pdf = 'test.pdf';

// In case you want to test the extract_pkcs7_signatures() function:

/*
$signatures = extract_pkcs7_signatures($path_to_pdf);
for ($s = 0; $s < count($signatures); $s++) {
    $path_to_pkcs7 = pathinfo($path_to_pdf, PATHINFO_FILENAME) . $s . '.pkcs7';
    file_put_contents($path_to_pkcs7, $signatures[$s]);
    echo shell_exec("openssl pkcs7 -inform DER -in $path_to_pkcs7 -print_certs -text");
}
exit;
*/

var_dump(who_signed($path_to_pdf));
?>
Antônio Medeiros
  • 3,068
  • 1
  • 27
  • 22
  • All signatures have a **ByteRange** entry (which can be written differently, though, and so be invisible to your script), but meanwhile there are other entities, too, with **ByteRange** entries. – mkl Mar 30 '23 at 22:19
  • What about pkcs12? I've tried your script but it shows blank result – frozenade Jun 13 '23 at 05:35