1

I am documenting a system I maintain. This documentation contains a diagram I created in TeX/TikZ which gets rendered to a PDF file. Then I convert the PDF file to an image file (PNG via imagemagick), and include it in my HTML documentation. Works great.

Now I would like to create an image map for the image, so that I can add hyperlinks/mouseovers/etc. This is an image that I expect to update periodically based on changes in my system, so I would like to automate this process if possible.

Is there a way to use a software library or tool to automatically create image maps of the various text content in the PDF file, when it gets rendered to PNG?

Here is an example from this gist I created:

enter image description here

In this case I would like to turn some of the various text strings into hyperlinks by locating their bounding box in the PDF:

  • controller
  • actuator
  • sensor
  • A
  • B
  • C
  • D
  • u
  • y
  • F(s)
  • G(s)
  • H(s)

(They are all text content in the PDF file; I can select the text of any of them in Acrobat Reader and copy + paste into my text editor.)

Is there a way to do this?

Jason S
  • 184,598
  • 164
  • 608
  • 970
  • I think you have to add 250 (or even more) additional bounty for this solution. This man has long time worked for you! – Bharata Jan 25 '19 at 05:53
  • Another route to consider would be to convert the TeX file to SVG. SVG supports clickable hrefs – RoyM Sep 19 '21 at 19:58

2 Answers2

5

I was able to put together the following Python solution that could serve as a starting point. It converts the pdf to a png and outputs corresponding image map markup.

It takes output dpi as an optional argument (default 200) in order to properly scale the bounding boxes onto the png from the default pdf dpi of 72:

from pdf2image import convert_from_path
from pdfminer.converter import PDFPageAggregator
from pdfminer.layout import LAParams, LTTextBox
from pdfminer.pdfinterp import PDFPageInterpreter
from pdfminer.pdfinterp import PDFResourceManager
from pdfminer.pdfpage import PDFPage

from yattag import Doc, indent

import argparse
import os


def transform_coords(lobj, mb):

    # Transform LTTextBox bounding box to image map area bounding box.
    #
    # The bounding box of each LTTextBox is specified as:
    #
    # x0: the distance from the left of the page to the left edge of the box
    # y0: the distance from the bottom of the page to the lower edge of the box
    # x1: the distance from the left of the page to the right edge of the box
    # y1: the distance from the bottom of the page to the upper edge of the box
    #
    # So the y coordinates start from the bottom of the image. But with image map
    # areas, y coordinates start from the top of the image, so here we subtract
    # the bounding box's y-axis values from the total height.

    return [lobj.x0, mb[3] - lobj.y1, lobj.x1, mb[3] - lobj.y0]


def get_imagemap(d):
    doc, tag, text = Doc().tagtext()
    with tag("map", name="map"):
        for k, v in d.items():
            doc.stag("area", shape="rect", coords=",".join(v), href="", alt=k)
    return indent(doc.getvalue())


def get_bboxes(pdf, dpi):
    fp = open(pdf, "rb")
    rsrcmgr = PDFResourceManager()
    device = PDFPageAggregator(rsrcmgr, laparams=LAParams())
    interpreter = PDFPageInterpreter(rsrcmgr, device)
    page = list(PDFPage.get_pages(fp))[0]

    interpreter.process_page(page)
    layout = device.get_result()

    # PDFminer reports bounding boxes based on a dpi of 72. I could not find a way
    # to change this, so instead I scale each coordinate by multiplying by dpi/72
    scale = dpi / 72.0

    return {
        lobj.get_text().strip(): [
            str(int(x * scale)) for x in transform_coords(lobj, page.mediabox)
        ]
        for lobj in layout
        if isinstance(lobj, LTTextBox)
    }


def main():
    parser = argparse.ArgumentParser()
    parser.add_argument("pdf")
    parser.add_argument("--dpi", type=int, default=200)

    args = parser.parse_args()

    page = list(convert_from_path(args.pdf, args.dpi))[0]
    page.save(f"{os.path.splitext(args.pdf)[0]}.png", "PNG")

    print(get_imagemap(get_bboxes(args.pdf, args.dpi)))


if __name__ == "__main__":
    main()

Example result:

<img src="https://i.stack.imgur.com/aXWMc.png" usemap="#map">
<map name="map">
  <area shape="rect" coords="361,8,380,43" href="#" alt="B" />
  <area shape="rect" coords="434,31,500,64" href="#" alt="G(s)" />
  <area shape="rect" coords="432,93,502,117" href="#" alt="actuator" />
  <area shape="rect" coords="552,8,572,42" href="#" alt="C" />
  <area shape="rect" coords="596,58,609,86" href="#" alt="y" />
  <area shape="rect" coords="105,26,119,40" href="#" alt="+" />
  <area shape="rect" coords="107,54,122,78" href="#" alt="−" />
  <area shape="rect" coords="35,58,51,86" href="#" alt="u" />
  <area shape="rect" coords="164,8,182,43" href="#" alt="A" />
  <area shape="rect" coords="163,152,183,187" href="#" alt="D" />
  <area shape="rect" coords="241,31,311,64" href="#" alt="H(s)" />
  <area shape="rect" coords="236,94,316,118" href="#" alt="controller" />
  <area shape="rect" coords="243,175,309,208" href="#" alt="F (s)" />
  <area shape="rect" coords="247,234,305,258" href="#" alt="sensor" />
</map>
cody
  • 11,045
  • 3
  • 21
  • 36
0

Hmm. I found the Apache PDFBox library and it contains an example called PrintLocations.java which does print information but I'm not sure how to interpret it, and it's one location per glyph.

> java -jar print_text_locations.jar blockdiagram_example.pdf
String[37.864998,13.939003 fs=4.9813 xscale=4.9813 height=2.49065 space=2.4906502 width=5.1197815]+
String[59.185997,13.662003 fs=9.9626 xscale=9.9626 height=6.1668496 space=2.769603 width=6.6450577]A
String[130.229,13.662003 fs=9.9626 xscale=9.9626 height=6.1668496 space=2.769603 width=6.64505]B
String[198.783,13.498001 fs=9.9626 xscale=9.9626 height=6.1668496 space=2.769603 width=7.192993]C
String[86.827,21.278 fs=11.9552 xscale=11.9552 height=5.9776 space=5.9776006 width=9.699257]H
String[97.449005,21.278 fs=11.9552 xscale=11.9552 height=5.983578 space=5.9776006 width=4.552536](
String[102.00201,21.278 fs=11.9552 xscale=11.9552 height=5.9776 space=5.9776006 width=5.5137405]s
String[107.51601,21.278 fs=11.9552 xscale=11.9552 height=5.983578 space=5.9776006 width=4.552536])
String[156.35,21.278 fs=11.9552 xscale=11.9552 height=5.9776 space=5.9776006 width=9.234192]G
String[165.58301,21.278 fs=11.9552 xscale=11.9552 height=5.983578 space=5.9776006 width=4.552536](
String[170.136,21.278 fs=11.9552 xscale=11.9552 height=5.9776 space=5.9776006 width=5.513733]s
String[175.65,21.278 fs=11.9552 xscale=11.9552 height=5.983578 space=5.9776006 width=4.552536])
String[12.797,29.332 fs=9.9626 xscale=9.9626 height=4.9813 space=4.9813004 width=5.7035875]u
String[38.711,27.432999 fs=4.9813 xscale=4.9813 height=3.4022279 space=2.4906502 width=5.39624]?
String[214.641,29.332 fs=9.9626 xscale=9.9626 height=4.9813 space=4.9813004 width=4.884659]y
String[85.109,41.419 fs=6.9738 xscale=6.9738 height=4.3167825 space=1.9387167 width=3.4869003]c
String[88.5959,41.419 fs=6.9738 xscale=6.9738 height=4.3167825 space=1.9387167 width=3.8774338]o
String[92.473335,41.419 fs=6.9738 xscale=6.9738 height=4.3167825 space=1.9387167 width=3.8774338]n
String[96.35077,41.419 fs=6.9738 xscale=6.9738 height=4.3167825 space=1.9387167 width=1.9387131]t
String[98.28948,41.419 fs=6.9738 xscale=6.9738 height=4.3167825 space=1.9387167 width=2.3222733]r
String[100.611755,41.419 fs=6.9738 xscale=6.9738 height=4.3167825 space=1.9387167 width=3.8774338]o
String[104.48919,41.419 fs=6.9738 xscale=6.9738 height=4.3167825 space=1.9387167 width=1.5481873]l
String[106.03738,41.419 fs=6.9738 xscale=6.9738 height=4.3167825 space=1.9387167 width=1.5481873]l
String[107.58556,41.419 fs=6.9738 xscale=6.9738 height=4.3167825 space=1.9387167 width=3.8774338]e
String[111.463,41.419 fs=6.9738 xscale=6.9738 height=4.3167825 space=1.9387167 width=2.3222733]r
String[155.67801,41.046 fs=6.9738 xscale=6.9738 height=4.3167825 space=1.9387167 width=3.8774261]a
String[159.55544,41.046 fs=6.9738 xscale=6.9738 height=4.3167825 space=1.9387167 width=3.4868927]c
String[163.04233,41.046 fs=6.9738 xscale=6.9738 height=4.3167825 space=1.9387167 width=1.9387207]t
String[164.98105,41.046 fs=6.9738 xscale=6.9738 height=4.3167825 space=1.9387167 width=3.8774261]u
String[168.85847,41.046 fs=6.9738 xscale=6.9738 height=4.3167825 space=1.9387167 width=3.8774261]a
String[172.7359,41.046 fs=6.9738 xscale=6.9738 height=4.3167825 space=1.9387167 width=1.9387207]t
String[174.67462,41.046 fs=6.9738 xscale=6.9738 height=4.3167825 space=1.9387167 width=3.8774261]o
String[178.55205,41.046 fs=6.9738 xscale=6.9738 height=4.3167825 space=1.9387167 width=2.322281]r
String[58.912003,65.483 fs=9.9626 xscale=9.9626 height=6.1668496 space=2.769603 width=7.192993]D
String[87.536,73.099 fs=11.9552 xscale=11.9552 height=5.9776 space=5.9776006 width=7.577202]F
String[96.740005,73.099 fs=11.9552 xscale=11.9552 height=5.983578 space=5.9776006 width=4.552536](
String[101.29201,73.099 fs=11.9552 xscale=11.9552 height=5.9776 space=5.9776006 width=5.5137405]s
String[106.80601,73.099 fs=11.9552 xscale=11.9552 height=5.983578 space=5.9776006 width=4.5525436])
String[88.983,91.978004 fs=6.9738 xscale=6.9738 height=4.3167825 space=1.9387167 width=3.4869003]s
String[92.4699,91.978004 fs=6.9738 xscale=6.9738 height=4.3167825 space=1.9387167 width=3.8774338]e
String[96.347336,91.978004 fs=6.9738 xscale=6.9738 height=4.3167825 space=1.9387167 width=3.8774338]n
String[100.22477,91.978004 fs=6.9738 xscale=6.9738 height=4.3167825 space=1.9387167 width=3.4869003]s
String[103.71167,91.978004 fs=6.9738 xscale=6.9738 height=4.3167825 space=1.9387167 width=3.8774338]o
String[107.5891,91.978004 fs=6.9738 xscale=6.9738 height=4.3167825 space=1.9387167 width=2.3222733]r

I did make a small change, however, and it looks like the writeString method gets called for each item of text, I guess I could find the overall bounding rectangle for each string:

/**
 * Override the default functionality of PDFTextStripper.
 */
@Override
protected void writeString(String string, List<TextPosition> textPositions) throws IOException
{
    System.out.println("text string: "+string);
    for (TextPosition text : textPositions)
    {
        System.out.println( "String[" + text.getXDirAdj() + "," +
                text.getYDirAdj() + " fs=" + text.getFontSize() + " xscale=" +
                text.getXScale() + " height=" + text.getHeightDir() + " space=" +
                text.getWidthOfSpace() + " width=" +
                text.getWidthDirAdj() + "]" + text.getUnicode() );
    }
}

output from the pdf file in the github gist:

> java -jar pdf2imagemap.jar blockdiagram_example.pdf
text string: +
String[37.864998,13.939003 fs=4.9813 xscale=4.9813 height=2.49065 space=2.4906502 width=5.1197815]+
text string: A
String[59.185997,13.662003 fs=9.9626 xscale=9.9626 height=6.1668496 space=2.769603 width=6.6450577]A
text string: B
String[130.229,13.662003 fs=9.9626 xscale=9.9626 height=6.1668496 space=2.769603 width=6.64505]B
text string: C
String[198.783,13.498001 fs=9.9626 xscale=9.9626 height=6.1668496 space=2.769603 width=7.192993]C
text string: H(s)
String[86.827,21.278 fs=11.9552 xscale=11.9552 height=5.9776 space=5.9776006 width=9.699257]H
String[97.449005,21.278 fs=11.9552 xscale=11.9552 height=5.983578 space=5.9776006 width=4.552536](
String[102.00201,21.278 fs=11.9552 xscale=11.9552 height=5.9776 space=5.9776006 width=5.5137405]s
String[107.51601,21.278 fs=11.9552 xscale=11.9552 height=5.983578 space=5.9776006 width=4.552536])
text string: G(s)
String[156.35,21.278 fs=11.9552 xscale=11.9552 height=5.9776 space=5.9776006 width=9.234192]G
String[165.58301,21.278 fs=11.9552 xscale=11.9552 height=5.983578 space=5.9776006 width=4.552536](
String[170.136,21.278 fs=11.9552 xscale=11.9552 height=5.9776 space=5.9776006 width=5.513733]s
String[175.65,21.278 fs=11.9552 xscale=11.9552 height=5.983578 space=5.9776006 width=4.552536])
text string: u
String[12.797,29.332 fs=9.9626 xscale=9.9626 height=4.9813 space=4.9813004 width=5.7035875]u
text string: ?
String[38.711,27.432999 fs=4.9813 xscale=4.9813 height=3.4022279 space=2.4906502 width=5.39624]?
text string: y
String[214.641,29.332 fs=9.9626 xscale=9.9626 height=4.9813 space=4.9813004 width=4.884659]y
text string: controller
String[85.109,41.419 fs=6.9738 xscale=6.9738 height=4.3167825 space=1.9387167 width=3.4869003]c
String[88.5959,41.419 fs=6.9738 xscale=6.9738 height=4.3167825 space=1.9387167 width=3.8774338]o
String[92.473335,41.419 fs=6.9738 xscale=6.9738 height=4.3167825 space=1.9387167 width=3.8774338]n
String[96.35077,41.419 fs=6.9738 xscale=6.9738 height=4.3167825 space=1.9387167 width=1.9387131]t
String[98.28948,41.419 fs=6.9738 xscale=6.9738 height=4.3167825 space=1.9387167 width=2.3222733]r
String[100.611755,41.419 fs=6.9738 xscale=6.9738 height=4.3167825 space=1.9387167 width=3.8774338]o
String[104.48919,41.419 fs=6.9738 xscale=6.9738 height=4.3167825 space=1.9387167 width=1.5481873]l
String[106.03738,41.419 fs=6.9738 xscale=6.9738 height=4.3167825 space=1.9387167 width=1.5481873]l
String[107.58556,41.419 fs=6.9738 xscale=6.9738 height=4.3167825 space=1.9387167 width=3.8774338]e
String[111.463,41.419 fs=6.9738 xscale=6.9738 height=4.3167825 space=1.9387167 width=2.3222733]r
text string: actuator
String[155.67801,41.046 fs=6.9738 xscale=6.9738 height=4.3167825 space=1.9387167 width=3.8774261]a
String[159.55544,41.046 fs=6.9738 xscale=6.9738 height=4.3167825 space=1.9387167 width=3.4868927]c
String[163.04233,41.046 fs=6.9738 xscale=6.9738 height=4.3167825 space=1.9387167 width=1.9387207]t
String[164.98105,41.046 fs=6.9738 xscale=6.9738 height=4.3167825 space=1.9387167 width=3.8774261]u
String[168.85847,41.046 fs=6.9738 xscale=6.9738 height=4.3167825 space=1.9387167 width=3.8774261]a
String[172.7359,41.046 fs=6.9738 xscale=6.9738 height=4.3167825 space=1.9387167 width=1.9387207]t
String[174.67462,41.046 fs=6.9738 xscale=6.9738 height=4.3167825 space=1.9387167 width=3.8774261]o
String[178.55205,41.046 fs=6.9738 xscale=6.9738 height=4.3167825 space=1.9387167 width=2.322281]r
text string: D
String[58.912003,65.483 fs=9.9626 xscale=9.9626 height=6.1668496 space=2.769603 width=7.192993]D
text string: F
String[87.536,73.099 fs=11.9552 xscale=11.9552 height=5.9776 space=5.9776006 width=7.577202]F
text string: (s)
String[96.740005,73.099 fs=11.9552 xscale=11.9552 height=5.983578 space=5.9776006 width=4.552536](
String[101.29201,73.099 fs=11.9552 xscale=11.9552 height=5.9776 space=5.9776006 width=5.5137405]s
String[106.80601,73.099 fs=11.9552 xscale=11.9552 height=5.983578 space=5.9776006 width=4.5525436])
text string: sensor
String[88.983,91.978004 fs=6.9738 xscale=6.9738 height=4.3167825 space=1.9387167 width=3.4869003]s
String[92.4699,91.978004 fs=6.9738 xscale=6.9738 height=4.3167825 space=1.9387167 width=3.8774338]e
String[96.347336,91.978004 fs=6.9738 xscale=6.9738 height=4.3167825 space=1.9387167 width=3.8774338]n
String[100.22477,91.978004 fs=6.9738 xscale=6.9738 height=4.3167825 space=1.9387167 width=3.4869003]s
String[103.71167,91.978004 fs=6.9738 xscale=6.9738 height=4.3167825 space=1.9387167 width=3.8774338]o
String[107.5891,91.978004 fs=6.9738 xscale=6.9738 height=4.3167825 space=1.9387167 width=2.3222733]r
Jason S
  • 184,598
  • 164
  • 608
  • 970