-1

When talking about the pdf file, I got a bad word with the Aspose library:

enter image description here

Our client does not accept blocks, he sees a word document like that of the original office. Do you have an idea, please?

Cindy Meister
  • 25,071
  • 21
  • 34
  • 43
Lemjid
  • 1
  • 3

1 Answers1

0

Please note that by default every visually grouped block of text in the original PDF file is converted into a textbox in the resulting document. This achieves maximal resemblance of the output document to the original PDF file. The output document will look good, but it will consist entirely of textboxes and it could make further editing of the document in Microsoft Word quite difficult.

Please use the Flow recognition mode for getting output without boundary boxes:

// Load source PDF file
Document doc = new Document( dataDir + "input.pdf");
// Instantiate Doc SaveOptions instance
DocSaveOptions saveOptions = new DocSaveOptions();
// Set output file format as DOCX
saveOptions.setFormat(DocSaveOptions.DocFormat.DocX);
// Set recognition mode
saveOptions.setMode(RecognitionMode.Flow);
// Save resultant DOCX file
doc.save( dataDir + "output.docx", saveOptions);

In this mode the engine performs grouping and multi-level analysis to restore the original document author's intent and produce a maximally editable document. The downside is that the output document might look different from the original PDF file.

We hope this will be helpful. Please feel free to contact if you need any further assistance.

PS: I work with Aspose as Developer Evangelist.

Farhan Raza
  • 392
  • 1
  • 8