I am getting a really rare issue. I am creating a PDF from HTML using wkhtmlTopdf and getting a nicely-created pdf.
But when I want to replace a word using pdfbox in the same string I am not able to do that.
why: because I am getting null character
while reading the content from Operators.
My Code:
protected static void replaceText(String word) throws IOException, COSVisitorException {
PDPage page = page1; // page1 is a variable which I assigns at class level
PDStream contents = page.getContents();
PDFStreamParser parser = new PDFStreamParser(contents.getStream());
parser.parse();
List tokens = parser.getTokens();
for(int i = 0; i < tokens.size(); i++){
Object next = tokens.get(i);
if(next instanceof PDFOperator){
PDFOperator operator = (PDFOperator) next;
if (operator.getOperation().equals("Tj")) {
COSString previous = (COSString) tokens.get(i - 1);
String string = previous.getString();//here i am getting /u0000 which is null
List<String> listOfStrings = Arrays.asList(string.split(" "));
if(listOfStrings.contains(word)) {
string = string.replaceFirst(word, "");
previous.reset();
previous.append(string.getBytes(StandardCharsets.ISO_8859_1));
}
}else if (operator.getOperation().equals("TJ")) {
COSArray previous = (COSArray) tokens.get(i - 1);
for (int k = 0; k < previous.size(); k++) {
Object arrElement = previous.getObject(k);
if (arrElement instanceof COSString) {
COSString cosString = (COSString) arrElement;
String string = cosString.getString();// same here
List<String> listOfStrings = Arrays.asList(string.split(" "));
if(listOfStrings.contains(word)) {
System.out.println(string);
string = string.replaceFirst(word, "");
cosString.reset();
cosString.append(string.getBytes(StandardCharsets.ISO_8859_1));
}
}
}
}
}
}
PDStream updatedStream = new PDStream(document);
OutputStream outputStream = updatedStream.createOutputStream();
ContentStreamWriter tokenWriter = new ContentStreamWriter(outputStream);
tokenWriter.writeTokens(tokens);
page.setContents(updatedStream);
document.save(staticFileName);
}
I am using pdfbox 1.8.6 which is the limitation for me.
I have tested this code on other pdfs(which are not created by wkhtmltopdf) and it works fine.