1

I read Create a tiff with only text and no images from a postscript file with ghostscript and try to use KenS`s answer. But this method remove only "black" images - image contain data only in black channel (PDF has colorspace CMYK). How can i remove all images in my case?

Community
  • 1
  • 1
WebRacer
  • 13
  • 3

3 Answers3

2

This does a better job, but its incomplete. It doesn't deal with images using multiple data sources for example. Its essentially untested, except that I did test your smaller file (pages.pdf) by using ps2write to convert to PostScript and then the PostScript program below, and teh pdfwrite device, to convert back to PDF.

One of the first things you will notice is that almost all the text has vanished from your document. That's because the fonts you are using are bitmap fonts, and the program can't tell the difference between a bitmap representing a character, and any other kind of bitmap. For this file you can solve that by removing the definition of imagemask because all the characters use imagemask, and the other images use 'image'.

I have a sneaky suspicion the formatting of the program is going to get messed up here :-(

8<------------------------------8<--------------------------8<-------------------------
%!

% 
% numbytes -file- ConsumeFileData -
%
/ConsumeFileData {
  userdict begin
  /DataString 256 string def
  /DataFile exch def
  /BytesToRead exch def

%(BytesToRead = ) print BytesToRead ==
  mark
  {
    DataFile DataString readstring {                    % read bytes
      /BytesToRead BytesToRead 256 sub def              % not EOF subtract 256 from required amount.
%(Read 256 bytes) ==
%(BytesToRead now = ) print BytesToRead ==
    } {
      length 
%(Read ) print dup 256 string cvs print (bytes) ==
      BytesToRead exch sub /BytesToRead exch def % Reached EOF, subtract length read froom required amount
%(BytesToRead now = ) print BytesToRead ==
      exit                                              % and exit loop 
    } ifelse
  } loop

%BytesToRead ==
  BytesToRead 0 gt {
    (Ran out of image data reading from DataSource\n) ==
  } if
  cleartomark
  end
} bind def

% 
% numbytes -proc- ConsumeProcData -
%
/ConsumeProcData {
userdict begin
  /DataProc exch def
  /BytesToRead exch def

  {
    DataProc exec                                     % returns a string
    length BytesToRead exch sub                       % subtract # bytes read
    /BytesToRead exch def
    BytesToRead 0 le {
      exit                                            % exit when read enough
    } if
  } loop
end
} bind def

/image {
 (image) ==
 dup type /dicttype eq { 
  dup /MultipleDataSources known {
    dup /MultipleDataSources get {
      (Can't handle image with multiple sources!) ==
    } if
  } if
  dup /Width get                 % stack = -dict- width
  exch dup /BitsPerComponent get % stack = width -dict- bpc
  exch dup /Decode get           % stack = width bpc -dict- decode
  length 2 div                   % decode = 2 * num components
  exch 4 1 roll                  % stack = -dict- width bpc ncomps
  mul mul                        % stack = -dict- width*bpc*ncomps
  7 add cvi 8 idiv               % stack = -dict- width(bytes) 
  exch dup /Height get           % stack = width -dict- height
  exch /DataSource get           % stack = width height DataSource
  3 1 roll                       % stack = DataSource width height
  mul                            % stack = DataSource widht*height
  exch                           % stack = size DataSource
 } {
  5 -1 roll 
  pop                       % throw away matrix
  mul mul                   % bits/sample*width*height
  7 add cvi 8 idiv          % size in bytes of data floor(bits+7 / 8)
  exch                      % stack = size DataSource
 } ifelse

 dup type /filetype eq { 
  ConsumeFileData
 } {
   dup type /arraytype eq or
   1 index type /packedarraytype eq or {
    ConsumeProcData
   } {
    pop pop                  % Remove DataSource and size
   } ifelse
 } ifelse
} bind def

/imagemask {
(imagemask)==
 dup type /dicttype eq { 
  dup /MultipleDataSources known {
    dup /MultipleDataSources get {
      (Can't handle imagemask with multiple sources!) ==
    } if
  } if
  dup /Width get                 % stack = -dict- width
  7 add cvi 8 idiv             % size in bytes of width floor(bits+7 / 8)
  exch dup /Height get           % stack = width -dict- height
  exch /DataSource get           % stack = width height DataSource
  3 1 roll                       % stack = DataSource width height
  mul                            % stack = DataSource width*height
  exch                           % stack = size DataSource
 } {
  5 -1 roll 
  pop                       % throw away matrix
  mul mul                   % bits/sample*width*height
  7 add cvi 8 idiv          % size in bytes of data floor(bits+7 / 8)
  exch                      % stack = size DataSource
 } ifelse

 dup type /filetype eq { 
  ConsumeFileData
 } {
   dup type /arraytype eq or
   1 index type /packedarraytype eq or {
    ConsumeProcData
   } {
    pop pop                  % Remove DataSource and size
   } ifelse
 } ifelse
} bind def

/colorimage {
(colorimage)==
  dup 1 ne {
    1 index
    {
      (Can't handle colorimage with multiple sources!) ==
    } if
  } {
    exch pop                   % get rid of 'multi'
                   % stack: w h bpc m d ncomp
    3 -1 roll pop              % stack: w h bpc d ncomp
    exch 5 -1 roll             % stack d w h bpc ncomp
    mul mul mul                % stack: d w*h*bpc*ncomp
    7 add cvi 8 idiv exch      % stack: bytes datasource
  } ifelse

 dup type /filetype eq { 
  ConsumeFileData
 } {
   dup type /arraytype eq or
   1 index type /packedarraytype eq or {
    ConsumeProcData
   } {
    pop pop                  % Remove DataSource and size
   } ifelse
 } ifelse
} bind def
Kurt Pfeifle
  • 86,724
  • 23
  • 248
  • 345
KenS
  • 30,202
  • 3
  • 34
  • 51
  • Thanks a lot for you job! Unfortnately I dont use it without formatting, because i dont know how format it well... – WebRacer Jun 28 '11 at 12:08
  • It still won't work, the invoked file only works for ps-files without images :/ – Joe Jun 29 '11 at 08:00
  • It worked for me with the smaller of the two sample files, though as I remarked at the time, for that file it also removes all the text, since the text is drawn as imagemasks. Since the redefintions only affect image operations, its hard to see how it could have any effect on a file which doesn't contain images..... It will (as I said) only work for PostScript files, with Ghostscript. – KenS Jun 29 '11 at 12:10
  • Probably means that its exercising a path through the code that I didn't test. Like I said, I only tested the smaller of the files you posted. You should be able to fix/extend the program easily enough. – KenS Jul 01 '11 at 07:41
  • @KenS: I did the formatting. It's easy: either use `
    ...
    ` tags. Or indent each line by (at least) 4 spaces (and separate the code by one empty line from the previous paragraph).
    – Kurt Pfeifle Jul 18 '11 at 17:48
1

That technique should work for images in any colour, because the image operator is used for both colour and monochrome images. Unless your file uses the obselete level 1.5 'colorimage' operator. I can't recall if I redefined that operator in the example, if not then yuo can redefine it in a similar fashion.

In fact I see that I offered redefinitions for image, colorimage and imagemask, so all image types should be elided. Perhaps you could share an example ?

KenS
  • 30,202
  • 3
  • 34
  • 51
  • Thanks for you answer! Files for test: http://array02.letmeprint.ru/noimages/cover.pdf (26M) http://array02.letmeprint.ru/noimages/pages.pdf (140K) http://array02.letmeprint.ru/noimages/noimage.ps Try to cut off images in this files... – WebRacer Jun 27 '11 at 06:00
  • These are PDF files, you did say in your question 'a PostScript file'. As I mentioned in the original posting, the PDF interpreter effectively uses the 'system' versions of these operators, so this technique won't work. Instead, first convert the PDF to PostScript using the ps2write device, tehn render the PostScript to TIFF with this prologue, and it should be OK. – KenS Jun 27 '11 at 10:44
  • In fact there is a small bug in that code, and it doesn't like 'inline' image. I should fix that. I'll try and post a better piece of code later. – KenS Jun 27 '11 at 10:51
0

To remove all raster images from a PDF you can use the Ghostscript command from Kurt's answer on Ask Ubuntu: command line - How to remove images from a PDF file:

gs -o noIMG.pdf   -sDEVICE=pdfwrite -dFILTERIMAGE                input.pdf

The removal is achieved by passing the filter as command-line flag -dFILTERIMAGE

If set, this will ignore all images in the input (in this context image means a bitmap), these will therefore not be rendered.

See also his answer on: How can I remove all images from a PDF?.

hc_dev
  • 8,389
  • 1
  • 26
  • 38