4

I have a base pdf file, and want to update the title into Chinese (UTF-8) using ghostscript and pdfmark, command like below

gs -dBATCH -dNOPAUSE -sDEVICE=pdfwrite -sOutputFile=result.pdf base.pdf pdfmarks

And the pdfmarks file (encoding is UTF-8 without BOM) is below

[ /Title (敏捷开发)
/Author (Larry Cai)
/Producer (xdvipdfmx (0.7.8))
/DOCINFO pdfmark

The command is successfully executed, while when I check the properties of the result.pdf

The title is changed to æŁ‘æ“·å¼•å‘

Please give me hints how to solve this, are there any parameters in gs command or pdfmark?

mu is too short
  • 426,620
  • 70
  • 833
  • 800
Larry Cai
  • 55,923
  • 34
  • 110
  • 156

3 Answers3

6

The PDF Reference states that the Title entry in the document info dictionary is of type 'text string'. Text strings are defined as using either PDFDocEncoding or UTF-16BE with a Byte Order Mark (see page 158 of the 1.7 PDF Reference Manual).

So you cannot specify a Title using UTF-8 without a BOM.

I would imagine that if you replace the Title string with a string defining the content using UTF-16BE with a BOM then it will work properly. I would suggest you use a hex string rather than a regular PostScript string to specify the data, simply for ease of use.

KenS
  • 30,202
  • 3
  • 34
  • 51
  • 1
    Thanks, and with the tool from http://blog.tremily.us/posts/PDF_bookmarks_with_Ghostscript/, I managed to get it work. And the tool will convert it to "UTF-16BE with a Byte Order Mark" – Larry Cai Feb 09 '12 at 07:42
2

Using the idea from Happyman Chiu my solution is next. Get a UTF-16BE string with BOM by

echo -n '(敏捷开发)' | iconv -t utf-16 |od -x -A none | tr -d ' \n' | sed 's/./\U&/g;s/^/</;s/$/>/'

You will get <FEFF0028654F63775F0053D10029>. Substitute this for title.

/Title <FEFF0028654F63775F0053D10029>
0

follow pdfmark for docinfo metadata in pdf is not accepting accented characters in Keywords or Subject

I use this function to create the string from utf-8 for info.txt, to be used by gs command.

  function str_in_pdf($str){
    $cmd = sprintf("echo '%s'| iconv -t utf-16 |od -x -A none",$str);
    exec($cmd,$out,$ret);
    return "<" . implode("",$out) .">";
  }
Community
  • 1
  • 1