I have a pdf from which I wish to remove all the image and other drawing content from it. and save the resultant as a new pdf.
I know how to remove text by using TJ , Tj operators , which I currently perform as below
op.getOperation().equals( "TJ")
Instead of removing the TJ,Tj operators , Is it possible to copy these Text operators onto an other pdf file with formatting intact so that the new pdf turns out to be pure text only pdf ? Its ok if text drawn using other than Tj , TJ operator misses out.
Code to remove TJ,Tj is taken from THIS stackoverflow post. But it partially works , it just removes images only, leaving drawing and other art intact.
EDIT : Other option I can think of is to set the cmyk color of all other operators outside the BT ET block to white. this way the pdf would feel text only. Is this possible ? If yes, Please support with code examples in pdfBox.