1

Using workbook.getAllPictures() I can get an array of picture data but unfortunately it is only the data and those objects have no methods for accessing the name of the picture or any other related information.

There is a HSSFPicture class which would contain all the details of the picture but how to get for example an array of those objects from the xls?

Update:

Found SO question How can I find a cell, which contain a picture in apache poi which has a method for looping through all the pictures in the worksheet. That works.

Now that I was able to try the HSSFPicture class I found out that the getFileName() method is returning the file name without the extension. I can use the getPictureData().suggestFileExtension() to get a suggested file extension but I really would need to get the extension the picture had when it was added into the xls file. Would there be a way to get it?

Update 2:

The pictures are added into the xls with a macro. This is the part of macro that is adding the images into the sheet. fname is the full path and imageName is the file name, both are including the extension.

Set img = Sheets("Receipt images").Pictures.Insert(fname)
img.Left = 10
img.top = top + 10
img.Name = imageName
Set img = Nothing

The routine to check if the picture already exists in the Excel file.

For Each img In Sheets("Receipt images").Shapes
    If img.Name = imageName Then
        Set foundImage = img
        Exit For
    End If
Next

This recognizes that "image.jpg" is different from "image.gif", so the img.Name includes the extension.

Community
  • 1
  • 1
Mika
  • 1,419
  • 18
  • 37
  • Are you sure that information actually gets stored in the Excel file? Is it something that Excel is able to extract? – Gagravarr Nov 19 '14 at 16:03
  • I'm pretty sure it is in the Excel file because we add the pictures into the file using a macro which checks that if a picture with the same name is already added it is not added twice. And this works with pictures which have otherwise same name but different extension. – Mika Nov 20 '14 at 06:29
  • `suggestFileExtension()` is exactly what is says. For a 'gif' image is suggests 'png' and for a 'jpg' it suggests 'jpeg'. – Mika Nov 20 '14 at 06:53
  • Just for the record, the original solution to this issue is in this [patch](http://pastebin.com/8MdM7zh5). So two years later, I've fiddled around with it again and provided POI patches [#60519](https://bz.apache.org/bugzilla/show_bug.cgi?id=60519), [#60520](https://bz.apache.org/bugzilla/show_bug.cgi?id=60520) and [#60521](https://bz.apache.org/bugzilla/show_bug.cgi?id=60521) – kiwiwings Dec 26 '16 at 23:35

2 Answers2

1

The shape names are not in the default POI objects. So if we need them we have to deal with the underlying objects. That is for the shapes in HSSF mainly the EscherAggregate (http://poi.apache.org/apidocs/org/apache/poi/hssf/record/EscherAggregate.html) which we can get from the sheet. From its parent class AbstractEscherHolderRecord we can get all EscherOptRecords which contains the options of the shapes. In those options are also to find the groupshape.shapenames.

My example is not the complete solution. It is only provided to show which objects could be used to achieve this.

Example:

import org.apache.poi.hssf.usermodel.*;
import org.apache.poi.ss.usermodel.*;

import java.io.FileOutputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.FileInputStream;
import java.io.InputStream;

import org.apache.poi.openxml4j.exceptions.InvalidFormatException;

import org.apache.poi.hssf.record.*;
import org.apache.poi.ddf.*;

import java.util.List;
import java.util.ArrayList;

class ShapeNameTestHSSF {

 public static void main(String[] args) {
  try {

   InputStream inp = new FileInputStream("workbook1.xls");
   Workbook wb = WorkbookFactory.create(inp);

   Sheet sheet = wb.getSheetAt(0);

   EscherAggregate escherAggregate = ((HSSFSheet)sheet).getDrawingEscherAggregate();

   EscherContainerRecord escherContainer = escherAggregate.getEscherContainer().getChildContainers().get(0); 
   //throws java.lang.NullPointerException if no Container present

   List<EscherRecord> escherOptRecords = new ArrayList<EscherRecord>();

   escherContainer.getRecordsById(EscherOptRecord.RECORD_ID, escherOptRecords);

   for (EscherRecord escherOptRecord : escherOptRecords) {
    for (EscherProperty escherProperty : ((EscherOptRecord)escherOptRecord).getEscherProperties()) {
     System.out.println(escherProperty.getName());
     if (escherProperty.isComplex()) {
      System.out.println(new String(((EscherComplexProperty)escherProperty).getComplexData(), "UTF-16LE"));
     } else {
      if (escherProperty.isBlipId()) System.out.print("BlipId = ImageId = ");
      System.out.println(((EscherSimpleProperty)escherProperty).getPropertyValue());
     }
     System.out.println("=============================");
    }
    System.out.println(":::::::::::::::::::::::::::::");
   }


   FileOutputStream fileOut = new FileOutputStream("workbook1.xls");
   wb.write(fileOut);
   fileOut.flush();
   fileOut.close();

  } catch (InvalidFormatException ifex) {
  } catch (FileNotFoundException fnfex) {
  } catch (IOException ioex) {
  }
 }
}

Again: This is not a ready to use solution. A ready to use solution cannot be provided here, because of the complexity of the EscherRecords. Maybe to get the correct EscherRecords for the image shapes and their related EscherOptRecords, you have recursive to loop through all EscherRecords in the EscherAggregate checking whether they are ContainerRecords and if so loop through its children and so on.

Axel Richter
  • 56,077
  • 6
  • 60
  • 87
  • Phief, This goes a little complex. I think we will restrict in our macro that a user cannot add images with the same base name but different extension so that we won't need to go into the EscherRecords. – Mika Nov 24 '14 at 07:04
  • Maybe the apache programmers will add the data such as name and description from the EscherOptRecord to the HSSFShape objects in future. This project seems highly in development until now. – Axel Richter Nov 24 '14 at 07:21
  • If it only is to get the right file extension, then see http://poi.apache.org/apidocs/org/apache/poi/hssf/usermodel/HSSFPictureData.html#getPictureType%28%29 Seems to be new. Is not in version 3.10. – Axel Richter Nov 24 '14 at 13:46
0

Start here:

http://poi.apache.org/spreadsheet/quick-guide.html#Images

this tutorial can help you to extract an image's information from an xls spreadsheet using Apache POI

BeginMoh
  • 123
  • 1
  • 11