6

Is it possible to extract chart information from an Office 2007 (xlsx / OpenXML) spreadsheet using Apache POI? I've managed to read in the spreadsheet and even get the part that refers to the chart but not sure how I can retrieve any info from this part e.g. Type of chart, chart data etc.

XSSFWorkbook xwb = new XSSFWorkbook("charts_lines.xlsx");

XSSFSheet sheet = xwb.getSheetAt(0);

I can also iterate through the package parts to retrieve the chart part, but I don't see how I then go on to retrieve any info about the chart?

Note, I'm not interested in creating charts using POI, just read as much chart info as is possible to do...I'm also not saving an xlsx. I simply wish to extract line colours, labels, data, chart type (pie, line, bar etc.)

Remi Guan
  • 21,506
  • 17
  • 64
  • 87
Aodh
  • 662
  • 1
  • 7
  • 24

4 Answers4

7

There isn't a high level representation at the moment, so you'll need to drop down into the xmlbeans level and work with the low level CT* objects.

For Chart Sheets, there's XSSFChartSheet which will give you a CTChartsheet object, which has a little bit of info.

For both XSSFChart and XSSFChartSheet (regular and chart sheets), you'll need to go via the drawings to get the charts. Each sheet with charts on it should have one Drawing, and the charts get linked from the drawing, rather than the sheet itself.

As of r1090442 (so POI 3.8 or newer), there's a method on XSSFDrawing to give you all the XSSFChart objects (which are wrappers around the /charts/chart#.xml part). If you're on a really really old version of POI, use the CTDrawing to get the details of the chart, grab the /charts/chart#.xml part that corresponts, and then have xmlbeans give you the CT objects for it. Either way that'll let you get the titles, types, data ranges etc.

It is a bit fiddly though, so do please consider sending in a patch to POI if you get something good worked out for working with the CTChart objects!

Gagravarr
  • 47,320
  • 10
  • 111
  • 156
  • As of r1090442, you should be able to get all the Charts easily from the XSSFDrawing of a XSSFSheet, hope that helps! – Gagravarr Apr 08 '11 at 20:52
  • Thanks for that. Just wondering though, I've downloaded 3.8 beta 1 and there doesn't seem to be an XSSFDrawing member in XSSFSheet? – Aodh Apr 13 '11 at 10:42
  • The method you want is http://poi.apache.org/apidocs/org/apache/poi/xssf/usermodel/XSSFSheet.html#createDrawingPatriarch() . You probably also want a nightly svn build for the new method to get charts from the drawing. – Gagravarr Apr 14 '11 at 09:29
  • Thanks @Gagravarr I got the latest from SVN. The issue now is figuring out how to extract the series values etc. Thank you very much :) – Aodh Apr 14 '11 at 13:21
  • These functions are in the latest version of Apache POI (v3.11). xssfsheet.[createDrawingPatriarch](http://poi.apache.org/apidocs/org/apache/poi/xssf/usermodel/XSSFSheet.html#createDrawingPatriarch())().[getCharts](http://poi.apache.org/apidocs/org/apache/poi/xssf/usermodel/XSSFDrawing.html#getCharts())() is the chain you want. – KevinL Feb 10 '15 at 15:49
  • 1
    @KevinL [getDrawingPatriarch()](http://poi.apache.org/apidocs/org/apache/poi/xssf/usermodel/XSSFSheet.html#getDrawingPatriarch%28%29) might be better - that'll avoid creating a new empty drawing if there are none – Gagravarr Feb 10 '15 at 15:53
5

you can read chart data as XML using XSSFDrawing

like

 XSSFDrawing drawing = ((XSSFSheet)sheet).createDrawingPatriarch();
        System.out.println(drawing.getCTDrawing().toString());

will print whole chart as XMl and also using

drawing.getCharts();

you can add Iterator to it to browse chart

nsgulliver
  • 12,655
  • 23
  • 43
  • 64
Mohit rathod
  • 51
  • 1
  • 2
2

I don't know the exact answer to your question, but the OpenXML SDK 2.0 comes with a DocumentReflector.exe tool that will show you exactly how the chart is defined (including all relationships between the SpreadsheetML and the DrawingML packages). There is some more info on this tool in this article.

Ben Hoffstein
  • 102,129
  • 8
  • 104
  • 120
1

Yes, It is possible to read any type of chart using Apache POI. But Before reading any chart information you need to know what XML string you are receiving because this could be different based on the chart type i.e. pie, line, bar, scatter or mixed (a combination of two or more) charts, etc. Therefore you approach will be different for different type of chart.

For a simple bar chart like this:

enter image description here

Your XML will look something like this:

<xml-fragment ...>
  <c:title>
    <c:tx>
      <c:rich>
        ...
        <a:p>
          ...
          <a:r>
            ...
            <a:t>Employee Salary</a:t>
          </a:r>
        </a:p>
      </c:rich>
    </c:tx>
    ...
  </c:title>
  ...
  <c:plotArea>
    ...
    <c:barChart>
      ...
      <c:ser>
        ...
        <c:cat>
          <c:strRef>
            ...
            <c:strCache>
              <c:ptCount val="5"/>
              <c:pt idx="0">
                <c:v>Tom</c:v>
              </c:pt>
              <c:pt idx="1">
                <c:v>John</c:v>
              </c:pt>
              <c:pt idx="2">
                <c:v>Harry</c:v>
              </c:pt>
              <c:pt idx="3">
                <c:v>Sam</c:v>
              </c:pt>
              <c:pt idx="4">
                <c:v>Richa</c:v>
              </c:pt>
            </c:strCache>
          </c:strRef>
        </c:cat>
        <c:val>
          <c:numRef>
            ...
            <c:numCache>
              <c:formatCode>"$"#,##0</c:formatCode>
              <c:ptCount val="5"/>
              <c:pt idx="0">
                <c:v>1000</c:v>
              </c:pt>
              <c:pt idx="1">
                <c:v>700</c:v>
              </c:pt>
              <c:pt idx="2">
                <c:v>300</c:v>
              </c:pt>
              <c:pt idx="3">
                <c:v>900</c:v>
              </c:pt>
              <c:pt idx="4">
                <c:v>800</c:v>
              </c:pt>
            </c:numCache>
          </c:numRef>
        </c:val>
        ...
      </c:ser>
      ...
    </c:barChart>
    ...
  </c:plotArea>
  ...
</xml-fragment>

Now based on the above XML String we can use CT* classes and its various methods to traverse through the whole XML using Apache POI. Let's see how to read Chart Title, Labels(Employee names) and Series(Employee salaries) using POI:

Workbook workbook = new XSSFWorkbook(new File(PATH));
Sheet sheet = workbook.getSheet("GraphSheet");

XSSFSheet xsheet = (XSSFSheet) sheet;
XSSFDrawing drawing = xsheet.getDrawingPatriarch();
if (drawing != null) {
    List<XSSFChart> charts = drawing.getCharts();

    for (int chartIndex = 0; charts != null && chartIndex < (charts.size()); chartIndex++) {

        XSSFChart chart = charts.get(chartIndex);

        CTChart chart2 = chart.getCTChart();
        CTPlotArea plot = chart2.getPlotArea();
        System.out.println("Chart Title :" + chart2.getTitle().getTx().getRich().getPArray(0).getRArray(0).getT());

        CTBarSer[] ctScaSerList = plot.getBarChartArray(0).getSerArray();

        for (CTBarSer ctLineSer : ctScaSerList) {

            CTStrVal[] ctStrVals = ctLineSer.getCat().getStrRef().getStrCache().getPtArray();
            for (int i = 0; i < ctStrVals.length; i++) {
                System.out.print(ctStrVals[i].getV() + ",");
            }
            System.out.println();
            CTNumVal[] ctXNumVal = ctLineSer.getVal().getNumRef().getNumCache().getPtArray();

            for (int i = 0; i < ctXNumVal.length; i++) {
                System.out.print(ctXNumVal[i].getV() + ",");
            }

        }
    }
}

Console:

Chart Title :Employee Salary
Tom,John,Harry,Sam,Richa,
1000,700,300,900,800,

Note: Here, the idea is to first read the XML String(because could be different based on your graph type) and then traverse the whole XML accordingly.

vikasrajputin
  • 81
  • 1
  • 5
  • how do i get the position of a charts and colors used? – Madhu Tomy Nov 26 '20 at 18:15
  • depends on what type of chart you have, simply your approach should be, first print the XML somewhere, analyze that XML carefully, surely you will find a common pattern in charts (from which you can get the position) and you might also find colors of the chart (about the color I am not 100% sure). Once you get all the information then use CT* classes to get the values of those properties. – vikasrajputin Apr 10 '21 at 02:42