6

I need to extract table cells as images. The cells may contain mixed content (Text + Image), which I need to merge into a single image. I am able to get the core text but I have no idea to get an image+text. Not sure if Apace POI would help.

Has anyone done something like this earlier?

  public static void readTablesDataInDocx(XWPFDocument doc) {
    int tableIdx = 1;
    int rowIdx = 1;
    int colIdx = 1;
    List table = doc.getTables();
    System.out.println("==========No Of Tables in Document=============================================" + table.size());
    for (int k = 0; k < table.size(); k++) {
        XWPFTable xwpfTable = (XWPFTable) table.get(k);
        System.out.println("================table -" + tableIdx + "===Data==");
        rowIdx = 1;
        List row = xwpfTable.getRows();
        for (int j = 0; j < row.size(); j++) {
            XWPFTableRow xwpfTableRow = (XWPFTableRow) row.get(j);
            System.out.println("Row -" + rowIdx);
            colIdx = 1;
            List cell = xwpfTableRow.getTableCells();
            for (int i = 0; i < cell.size(); i++) {
                XWPFTableCell xwpfTableCell = (XWPFTableCell) cell.get(i);
                if (xwpfTableCell != null) {
                    System.out.print("\t" + colIdx + "- column value: " + xwpfTableCell.getText());
                }
                colIdx++;
            }
            System.out.println("");
            rowIdx++;
        }
        tableIdx++;
        System.out.println("");
    }
}

Now I am able to get Text with the help of this method

System.out.print("\t" + colIdx + "- column value: " + xwpfTableCell.getText());

How do I get the Image if a cell also contains one?

Vogel612
  • 5,620
  • 5
  • 48
  • 73
KuldeeP ChoudharY
  • 446
  • 1
  • 6
  • 22
  • 1
    Try getting the paragraphs in the cell `getParagraphs()`, then for each paragraph, get the runs `getRuns()`. This returns a [XWPFRun](https://poi.apache.org/apidocs/org/apache/poi/xwpf/usermodel/XWPFRun.html). This has a method that allows you to get pictures: `getEmbeddedPictures()` – iggymoran Jun 22 '16 at 08:16
  • @iggymoran List para = xwpfTableCell.getParagraphs(); if (para != null) { XWPFRun xWPFRun = (XWPFRun) para.get(i); for (int l = 0; l < para.size(); l++) { System.out.print("\t" + colIdx + "- column Image: " + xWPFRun.getEmbeddedPictures()); } } getting class cast exception.. – KuldeeP ChoudharY Jun 22 '16 at 09:09
  • 1
    You want to try something like: `para.getRuns()`, verify that they're not null, and then call `run.getEmbeddedPictures()`. – iggymoran Jun 22 '16 at 09:12
  • para is having List type so how do i call getRuns() using para ? – KuldeeP ChoudharY Jun 22 '16 at 09:25
  • I've posted an answer as it can be more descriptive than comments, but you can call the method once you've verified that your cell is not null. – iggymoran Jun 22 '16 at 10:07

2 Answers2

4

Try this code, it's working for me

 XWPFDocument doc = new XWPFDocument(new FileInputStream(fileName));
            List<XWPFTable> table = doc.getTables();
            for (XWPFTable xwpfTable : table) {
                List<XWPFTableRow> row = xwpfTable.getRows();
                for (XWPFTableRow xwpfTableRow : row) {
                    List<XWPFTableCell> cell = xwpfTableRow.getTableCells();
                    for (XWPFTableCell xwpfTableCell : cell) {
                        if (xwpfTableCell != null) {
                            System.out.println(xwpfTableCell.getText());
                            String s = xwpfTableCell.getText();
                            for (XWPFParagraph p : xwpfTableCell.getParagraphs()) {
                                for (XWPFRun run : p.getRuns()) {
                                    for (XWPFPicture pic : run.getEmbeddedPictures()) {
                                        byte[] pictureData = pic.getPictureData().getData();
                                        System.out.println("picture : " + pictureData);
                                    }
                                }
                            }
                        }
                    }
                }
            }
2

When you have a Cell, you can get hold of the paragraphs that form that Cell. These paragraphs are in turn formed by Runs, which you can obtain by calling the getRuns method. Runs themselves can contain embedded images, which you can obtain by calling the getEmbeddedPictures method.

You can therefore have a method that gets the embedded pictures of a cell:

public static void printDescriptionOfImagesInCell(XWPFTableCell cell) {
    List<XWPFParagraph> paragrahs = cell.getParagraphs();
    for (XWPFParagraph paragraph : paragraphs) {
        List<XWPFRun> runs = paragraph.getRuns();
        for (XWPFRun run : runs) {
            List<XWPFPicture> pictures = run.getEmbeddedPictures();
            for (XWPFPicture picture : pictures) {
                //Do anything you want with the picture:
                System.out.println("Picture: " + picture.getDescription());
            }
        }
    }
}

You should be able to discover more things about the actual pictures with the Picture documentation, and change the method to actually get the image data, name, etc.

iggymoran
  • 4,059
  • 2
  • 21
  • 26