0

With the solution from this question or this one :

gs -o file-with-outlines.pdf -dNoOutputFonts -sDEVICE=pdfwrite file.pdf

...I converted a pdf file with searchable text to one without searchable text. The file size however increased a hundred fold. I then used this solution:

convert -compress Zip -density 150x150 input.pdf output.pdf

..to make it smaller again. The Zip compression did a better job than jpeg in terms of image quality and resulting file size.

I noticed however that as I increased the 150x150 to 200x200 and eventually to 500x500, the image quality got better while the file size got smaller. I seemed to be getting something for nothing, although the processing time was getting longer for higher densities:

density file size (kB) processing time (s)
150x150 6430 27
200x200 5520 34
300x300 3504 50
500x500 2624 105

Does anyone understand what is going on?

update - sorry, I just realised that every time I increase the density it is losing pages! So that is not acceptable, it explains the behaviour but not why the pages are disappearing. Will investigate

cardamom
  • 6,873
  • 11
  • 48
  • 102
  • Post your input PDF so others can test your numbers. – fmw42 Apr 21 '21 at 15:10
  • @fmw42 It's too private to post, do you think this is then unexpected behaviour? – cardamom Apr 21 '21 at 16:29
  • Yes, it surprises me unless there is something about the LZW compression that I am missing. Increasing the density means that Imagemagick via Ghostscript will rasterize the image at larger dimensions. Double the density will mean 4x number of pixels. Is it the same image and are you sure there is no change from kb to mb. Can you reproduce this with some image that you can post? – fmw42 Apr 21 '21 at 16:44
  • @fmw42 I just realised (and commented above) that as the density is increased, pages are disappearing. There are supposed to be about 70 pages but by the time it is at 500x500 there are only 18 left. So something else is going on – cardamom Apr 21 '21 at 16:58
  • 1
    You are running out of either RAM or space in your /tmp directory. Check your /tmp or /temp directory to see if there are Imagemagick files left there and remove them. Also remove any large files or just remove all files in your /tmp directory. – fmw42 Apr 21 '21 at 17:36
  • Thanks, you are right, and this is very demanding of /tmp. It is easier said than done though, am investigating why /tmp is so full on my debian system. – cardamom Apr 21 '21 at 19:59
  • 1
    If you have Imagemagick files, any time you run out of RAM (big file processing), it will leave temporary files in /tmp. When Imagemagick succeeds, the files would automatically be removed. When you double the density, you increase the number of pixels by 4x, so 4 times the amount of RAM needed. I cannot say why other files are being left. – fmw42 Apr 21 '21 at 20:06
  • See [resources](https://www.imagemagick.org/script/resources.php) on how to change from `/tmp` with env var `MAGICK_TEMPORARY_PATH=/path`. – meuh Apr 22 '21 at 06:28
  • Thanks @meuth I did two things. First was to edit `/etc/fstab` to put `/tmp` into ram ('tmpfs') which let it go to 8GB. `/tmp` was in the root partition which was almost full and I did not want to take risks repartitioning the hdd right now. Good to know what needs to be done though if more horsepower for image processing is needed. Secondly, I found `policy.xml` in `/etc/ImageMagick-6/` and increased various numbers in there. So now the odd behaviour is gone, and can go for higher densities without it omitting pages. – cardamom Apr 22 '21 at 09:39
  • I also saw this line in there `` but chose not to uncomment or move it. – cardamom Apr 22 '21 at 09:45

0 Answers0