I'm facing the following use case :
I receive one pdf that contains many documents. Each document has a different number of page. They are separated by barcode page.
Is it possible to split a multipage PDF containing several documents that are separated by a page With a barcode, and create New pdf's, one for each document?
I read that we can split a pdf with Itext : https://developers.itextpdf.com/examples/stamping-content-existing-pdfs/clone-splitting-pdf-file
But I don't find on the web the way to split it when i detect barcode page.
UPDATE : @mkl I have found how to read text from QR Code with zxing: It works with simple png file
File QRfile = new File("test.png");
BufferedImage bufferedImg = ImageIO.read(QRfile);
LuminanceSource source = new BufferedImageLuminanceSource(bufferedImg);
BinaryBitmap bitmap = new BinaryBitmap(new HybridBinarizer(source));
Result result = new MultiFormatReader().decode(bitmap);
System.out.println("Barcode Format: " + result.getBarcodeFormat());
System.out.println("Content: " + result.getText());
But it doesn't work in loop. I test with pdf document (7 pages)
Here JAVA Code :
PdfDocument pdfDoc;
pdfDoc = new PdfDocument(new PdfReader(pathName));
logger.debug("pdfDoc OK");
PdfDocumentContentParser contentParser = new PdfDocumentContentParser(pdfDoc);
for (int page = 1; page <= pdfDoc.getNumberOfPages(); page++)
{
logger.debug("page: " + page);
contentParser.processContent(page, new IEventListener()
{
@Override
public Set<EventType> getSupportedEvents()
{
logger.debug("inside getSupportedEvents");
return Collections.singleton(RENDER_IMAGE);
}
@Override
public void eventOccurred(IEventData data, EventType type)
{
index = index + 1;
logger.debug("inside eventOccurred - data: " + data);
logger.debug("inside eventOccurred - type: " + type);
logger.debug("inside eventOccurred - index: " + index);
if (data instanceof ImageRenderInfo)
{
logger.debug("data instanceof ImageRenderInfo");
ImageRenderInfo imageRenderInfo = (ImageRenderInfo) data;
byte[] bytes = imageRenderInfo.getImage().getImageBytes();
try
{
logger.debug("avant Files writer");
String pngName = "C:/alfresco/klinck/splitImage-" + index + ".png";
logger.debug("pngName: " + pngName);
Files.write(new File(pngName).toPath(), bytes);
logger.debug("Files written");
File QRfile = new File(pngName);
logger.debug("QR File trouvé ! ");
BufferedImage bufferedImg = ImageIO.read(QRfile);
logger.debug("bufferedImg OK ");
LuminanceSource source = new BufferedImageLuminanceSource(bufferedImg);
logger.debug("source OK ");
BinaryBitmap bitmap = new BinaryBitmap(new HybridBinarizer(source));
logger.debug("bitmap OK");
Result result = new MultiFormatReader().decode(bitmap);
logger.debug("SplitFluxJobExcecuter - resultBarcodeFormat: " + result.getBarcodeFormat());
logger.debug("SplitFluxJobExcecuter - result.getText(): " + result.getText());
}catch (Exception e)
{
logger.error("SplitJobExecuter Exception : " + ExceptionUtils.getStackTrace(e));
}
}
}
int index = 0;
});
}
First page contains 3 images (1 QR Code) . I get "com.google.zxing.NotFoundException" during last Step.
This is Log:
2018-07-25 16:27:00,227 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] pdfDoc OK
2018-07-25 16:27:00,227 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] page: 1
2018-07-25 16:27:00,237 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] inside getSupportedEvents
2018-07-25 16:27:00,265 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] inside eventOccurred - data: com.itextpdf.kernel.pdf.canvas.parser.data.ImageRenderInfo@2472ac79
2018-07-25 16:27:00,266 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] inside eventOccurred - type: RENDER_IMAGE
2018-07-25 16:27:00,266 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] inside eventOccurred - index: 1
2018-07-25 16:27:00,266 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] data instanceof ImageRenderInfo
2018-07-25 16:27:00,266 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] avant Files writer
2018-07-25 16:27:00,266 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] pngName: C:/alfresco/klinck/splitImage-1.png
2018-07-25 16:27:00,270 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] Files written
2018-07-25 16:27:00,270 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] QR File trouvé !
2018-07-25 16:27:00,304 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] bufferedImg OK
2018-07-25 16:27:00,305 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] source OK
2018-07-25 16:27:00,306 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] bitmap OK
2018-07-25 16:27:00,407 ERROR [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] SplitJobExecuter Exception : com.google.zxing.NotFoundException
2018-07-25 16:27:00,407 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] inside eventOccurred - data: com.itextpdf.kernel.pdf.canvas.parser.data.ImageRenderInfo@6e036aea
2018-07-25 16:27:00,407 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] inside eventOccurred - type: RENDER_IMAGE
2018-07-25 16:27:00,407 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] inside eventOccurred - index: 2
2018-07-25 16:27:00,407 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] data instanceof ImageRenderInfo
2018-07-25 16:27:00,408 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] avant Files writer
2018-07-25 16:27:00,408 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] pngName: C:/alfresco/klinck/splitImage-2.png
2018-07-25 16:27:00,411 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] Files written
2018-07-25 16:27:00,411 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] QR File trouvé !
2018-07-25 16:27:00,415 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] bufferedImg OK
2018-07-25 16:27:00,415 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] source OK
2018-07-25 16:27:00,415 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] bitmap OK
2018-07-25 16:27:00,473 ERROR [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] SplitJobExecuter Exception : com.google.zxing.NotFoundException
2018-07-25 16:27:00,474 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] inside eventOccurred - data: com.itextpdf.kernel.pdf.canvas.parser.data.ImageRenderInfo@4c205db7
2018-07-25 16:27:00,474 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] inside eventOccurred - type: RENDER_IMAGE
2018-07-25 16:27:00,474 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] inside eventOccurred - index: 3
2018-07-25 16:27:00,474 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] data instanceof ImageRenderInfo
2018-07-25 16:27:00,474 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] avant Files writer
2018-07-25 16:27:00,474 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] pngName: C:/alfresco/klinck/splitImage-3.png
2018-07-25 16:27:00,478 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] Files written
2018-07-25 16:27:00,478 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] QR File trouvé !
2018-07-25 16:27:00,479 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] bufferedImg OK
2018-07-25 16:27:00,479 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] source OK
2018-07-25 16:27:00,479 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] bitmap OK
2018-07-25 16:27:00,484 ERROR [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] SplitJobExecuter Exception : com.google.zxing.NotFoundException
From page 2 to page 7, the error message is different :
2018-07-25 16:27:00,487 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] page: 2
2018-07-25 16:27:00,488 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] inside getSupportedEvents
2018-07-25 16:27:00,488 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] inside eventOccurred - data: com.itextpdf.kernel.pdf.canvas.parser.data.ImageRenderInfo@6d41ffa2
2018-07-25 16:27:00,488 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] inside eventOccurred - type: RENDER_IMAGE
2018-07-25 16:27:00,488 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] inside eventOccurred - index: 1
2018-07-25 16:27:00,489 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] data instanceof ImageRenderInfo
2018-07-25 16:27:00,489 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] avant Files writer
2018-07-25 16:27:00,489 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] pngName: C:/alfresco/klinck/splitImage-1.png
2018-07-25 16:27:00,492 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] Files written
2018-07-25 16:27:00,493 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] QR File trouvé !
2018-07-25 16:27:00,493 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] bufferedImg OK
2018-07-25 16:27:00,493 ERROR [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] SplitJobExecuter Exception : java.lang.NullPointerException
at com.google.zxing.client.j2se.BufferedImageLuminanceSource.<init>(BufferedImageLuminanceSource.java:42)
at com.klinck.mc.jobs.SplitFluxJobExecuter$1.eventOccurred(SplitFluxJobExecuter.java:150)
at com.itextpdf.kernel.pdf.canvas.parser.PdfCanvasProcessor.eventOccurred(PdfCanvasProcessor.java:534)
at com.itextpdf.kernel.pdf.canvas.parser.PdfCanvasProcessor.displayImage(PdfCanvasProcessor.java:573)
at com.itextpdf.kernel.pdf.canvas.parser.PdfCanvasProcessor.access$5800(PdfCanvasProcessor.java:108)
at com.itextpdf.kernel.pdf.canvas.parser.PdfCanvasProcessor$ImageXObjectDoHandler.handleXObject(PdfCanvasProcessor.java:1420)
at com.itextpdf.kernel.pdf.canvas.parser.PdfCanvasProcessor.displayXObject(PdfCanvasProcessor.java:566)
at com.itextpdf.kernel.pdf.canvas.parser.PdfCanvasProcessor.access$5600(PdfCanvasProcessor.java:108)
at com.itextpdf.kernel.pdf.canvas.parser.PdfCanvasProcessor$DoOperator.invoke(PdfCanvasProcessor.java:1285)
at com.itextpdf.kernel.pdf.canvas.parser.PdfCanvasProcessor.invokeOperator(PdfCanvasProcessor.java:452)
at com.itextpdf.kernel.pdf.canvas.parser.PdfCanvasProcessor.processContent(PdfCanvasProcessor.java:281)
at com.itextpdf.kernel.pdf.canvas.parser.PdfCanvasProcessor.processPageContent(PdfCanvasProcessor.java:302)
at com.itextpdf.kernel.pdf.canvas.parser.PdfDocumentContentParser.processContent(PdfDocumentContentParser.java:77)
at com.itextpdf.kernel.pdf.canvas.parser.PdfDocumentContentParser.processContent(PdfDocumentContentParser.java:90)
at com.klinck.mc.jobs.SplitFluxJobExecuter.execute(SplitFluxJobExecuter.java:118)
at com.klinck.mc.jobs.SplitFluxJob$1.doWork(SplitFluxJob.java:27)
at org.alfresco.repo.security.authentication.AuthenticationUtil.runAs(AuthenticationUtil.java:555)
at com.klinck.mc.jobs.SplitFluxJob.executeJob(SplitFluxJob.java:24)
at org.alfresco.schedule.ScheduledJobLockExecuter.execute(ScheduledJobLockExecuter.java:94)
at org.alfresco.schedule.AbstractScheduledLockedJob.executeInternal(AbstractScheduledLockedJob.java:72)
at org.springframework.scheduling.quartz.QuartzJobBean.execute(QuartzJobBean.java:114)
at org.quartz.core.JobRunShell.run(JobRunShell.java:216)
at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:563)
UPDATE 2
I think the error message "com.google.zxing.NotFoundException" appears because images don't contain text message or are too large : com.google.zxing.NotFoundException exception comes when core java program executed?