6

I'm looking for a SVG to PDF converter that preserves the text in the SVG. I've tried Batik, Inkscape, and CairoSVG. The PDF generated by all of them is a bitmap image, including the text; the text cannot be selected/searched in a PDF viewer. All of them don't do a great job either, especially CairoSVG.

I followed the directions here (note that you don't have to compile FOP - you can download the PDF transcoder from here). Now I see that if I zoom into the PDF, the clarity is preserved, which I assume means the text is preserved. However, I cannot search or select the text.

Also, I compared the output of using PDF transcoder from FOP versus what's in Batik, and I see no difference.

Community
  • 1
  • 1
user2233706
  • 6,148
  • 5
  • 44
  • 86
  • Can you give more detail about how you are converting an SVG document to PDF with Inkscape? I've converted all sorts of text-based documents with Inkscape, and the text is not converted to an image for me. – halfer Apr 07 '13 at 20:52
  • I see that it is. The quality of the PDF generated is worse than Batik, so I guess I didn't bother trying to highlight anything. My original problem is solved by Inkscape, but the generated PDF is not of something I want to use. The SVG displayed is not as good as other viewers. – user2233706 Apr 07 '13 at 21:29
  • _The quality of the PDF generated is worse than Batik_ - in what way exactly? Would you provide an example of your input SVG and output PDF as processed by Inkscape? (Edit: Thomas makes a very good point that gradients could force a renderer to do what you describe; nevertheless, for a question like this, input and output files should be offered, otherwise readers are just guessing). – halfer Apr 07 '13 at 22:10
  • Here is the PDF generated by [Chrome](http://i.minus.com/1365473866/MNroL0cTyCYGeprWrWerEQ/dnhjdkxDuzdWx/test_chrome.pdf), [Batik](http://i.minus.com/1365473859/zIy1No1js9u1HqAJXFNwXQ/dt4NyL6KeOWO7/test_batik.pdf), and [Inkscape](http://i.minus.com/1365473871/R5LVdmppLypW06oGiSaq4w/dbts6FAqpt5hG2/test_inkscape.pdf). The [original SVG](http://i.minus.com/ibiAMqcboj8DBe.svg) is best viewed in Chrome or Batik. The SVG was generated by a program, and I had to snip parts out to come up with a minimal example. Chrome has the best PDF, but the problem is that the page size is limited. – user2233706 Apr 08 '13 at 02:33
  • Hmm, a puzzler; the Inkscape PDF seems to be fine text-wise, but the page-icon background doesn't render to PDF well, since it is converted to a bitmap. This is because the background contains colour transform filters, which seem to be unnecessary anyway. Removing the filters in Inkscape, and setting flat colour normally, made the affected elements render fine. Are you able to redraw the page using ordinary graphics primitives, with no filters? It seems it could do with simplifying anyway - it seems to have too many constituent objects. – halfer Apr 08 '13 at 07:29
  • The program I used allows you to export to SVG and internally they use their own format. The program blindly uses filter effects when not needed. I guess it is possible to modify the SVG such that it only uses ordinary primitives. A PNG generated by Batik is fine, though. I want to note that only Chrome and Batik rendered the original SVG correctly; all the other programs created small gaps within a page icon. – user2233706 Apr 08 '13 at 15:55
  • If nothing else, report the issue to the makers of that program, as a bug - I wonder what they tested the SVG output with? If you're really stuck, writing a program to fix up the XML would do the trick. – halfer Apr 08 '13 at 17:54
  • This is really horrible SVG! Why is it using filters instead of fill/stroke properties? It's also using the same IDs multiple times. – Thomas W Apr 08 '13 at 22:22
  • @user2233706: this is a very interesting question. If a bounty would help on this, and you're still working on it, let me know and I'll add one. – halfer Apr 09 '13 at 09:33
  • Firstly, I appreciate all the effort everybody put into this. Since I could not find a satisfactory solution, I decided to resign myself to living with a PNG image. However, it would still be nice if the PDF can be generated properly. Furthermore, my original question was to just preserve the text, but I'd also like background page rendering to also be good. A bounty would be nice for the sake of learning something new. – user2233706 Apr 10 '13 at 03:11

3 Answers3

1

If you're using filters, gradients or masking, it might be that it's impossible to translate this 1:1 to PDF. In these cases, converters usually raster the vector data to achieve a similar visual appearance instead of preserving the vector data and get a very different look.

Edit: In your example case, we can make sure that fill attributes are used instead of filters with the help of the following XSLT transformation:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  version="1.0" xmlns="http://www.w3.org/2000/svg" xmlns:svg="http://www.w3.org/2000/svg">

  <xsl:template match="node()|@*">
    <xsl:copy>
      <xsl:apply-templates select="node()|@*"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="@fill[ancestor::svg:symbol]" priority="1">
    <xsl:attribute name="fill">currentColor</xsl:attribute>
  </xsl:template>

  <xsl:template match="@filter[starts-with(.,'url(#colorFilter-')]">
    <xsl:attribute name="color">
      <xsl:value-of select="concat('#',substring(.,18,6))"/>
    </xsl:attribute>
  </xsl:template>

  <xsl:template match="svg:use[not(@filter)]">
    <xsl:copy>
      <xsl:attribute name="color">#fff</xsl:attribute>
      <xsl:apply-templates select="node()|@*"/>
    </xsl:copy>
  </xsl:template>
</xsl:stylesheet>

This fully relies on how in this particular SVG the filters are named, so it's not applicable to anything else. The colors aren't quite right, though. I'd be very interested in learning why this color matrix:

0.4 0   0   0 0
0   0.6 0   0 0
0   0   0.8 0 0
0   0   0   1 0

applied to white obviously does not result in rgba(40%,60%,80%,1).

Thomas W
  • 14,757
  • 6
  • 48
  • 67
  • My SVG is generated by another program. I reduced it to the following that only displays text: ` Test ` and ran the following command: `java -jar batik-rasterizer.jar -m application/pdf ..\test.svg -font-family Arial` The text it not selectable from a PDF viewer. The text is converted to a vector image. – user2233706 Apr 07 '13 at 19:09
  • I can not reproduce your problem. I'm doing this: `rasterizer -m application/pdf test.svg` with the Batik rasterizer, and I can perfectly select the text with Adobe Reader or Evince. What PDF viewer are you working with? – Thomas W Apr 08 '13 at 21:06
  • 1
    Firstly, thanks for spending time on this. The SVG now renders properly in Firefox. However, it no longer renders at all in Chrome. Secondly, I am using PDF-XChange Viewer on Windows. I have Adobe Reader 8.1, and I see the same problem. I installed Evince and same results. When you say select, does that mean you can copy it? And can you search for it? I assumed it is vector image because when I zoom in, the quality is maintained, but you see the text is not smooth, as if it was drawn with multiple paths. Also, with your transformation, the text is still not selectable. – user2233706 Apr 09 '13 at 00:44
1

Have a look at rsvg-convert, part of librsvg. I have used it to convert SVG documents to PDF and it preserves text such that it is selectable and searchable in PDF viewers.

Here is a blog post comparing it to some other options, and showing how to use it: https://www.itsfullofstars.de/tag/rsvg-convert/

Travis G.
  • 218
  • 2
  • 9
0

HAve you tried printing the SVG to a PDF printer?

mark stephens
  • 3,205
  • 16
  • 19
  • Yes. The problem with that is that my SVG is very large. It's probably a few feet. Printing to PDF results in multiple pages of output. – user2233706 Apr 06 '13 at 19:56
  • Google Chrome does an excellent job of converting an SVG to a PDF; it preserves the original PDF better than anything else. The PDF is selectable in a PDF viewer, which is what I want. However, the output page size is limited to letter or similarly sized paper. I don't want to be limited to any paper size. – user2233706 Apr 07 '13 at 19:14
  • I meant "the PDF **text** is selectable in a PDF viewer." – user2233706 Apr 08 '13 at 02:50