0

I have written a program to convert PPTX to PNG. All the conversion happens fine only issue is where ever there is UNICODE character in PPTX file - it converts that to a junk character. Here is the code. I tried to add fonts but that did not help. This is what PPTX contains - "/ˌinəˈvāSHən/". It converts letters i, n, v, a, S, H, n fine but not others.

    FileInputStream is = new FileInputStream(strTempPath);
    XMLSlideShow pptx = new XMLSlideShow(is);
    is.close();
    double zoom = 2; // magnify it by 2
    AffineTransform at = new AffineTransform();
    at.setToScale(zoom, zoom);
    Dimension pgsize = pptx.getPageSize();             
    XSLFSlide[] slide = pptx.getSlides();

    }              
    // BufferedImage img = new BufferedImage((int)Math.ceil(pgsize.width*zoom), (int)Math.ceil(pgsize.height*zoom), BufferedImage.TYPE_INT_RGB);
    BufferedImage img = new BufferedImage(pgsize.width, pgsize.height, BufferedImage.TYPE_INT_RGB);
    Graphics2D graphics = img.createGraphics();
    //graphics.setTransform(at);                
    graphics.setPaint(Color.white);
    graphics.fill(new Rectangle2D.Float(0, 0, pgsize.width, pgsize.height));
    slide[iPageNo].draw(graphics);             
    // FileOutputStream output = new ByteArrayOutputStream("C:/Temp/aspose/word/slide-" + (10 + 1) + ".png");        
    output = new ByteArrayOutputStream();
    javax.imageio.ImageIO.write(img, "png", output);

This is how I am trying to add fonts but still did not convert.

        Font customFont = Font.createFont(Font.TRUETYPE_FONT, new File("/usr/share/fonts/GEInspRg.ttf")).deriveFont(12f);
        GraphicsEnvironment ge = GraphicsEnvironment.getLocalGraphicsEnvironment();
        //register the font
        ge.registerFont(Font.createFont(Font.TRUETYPE_FONT, new File("/usr/share/fonts/GEInspRg.ttf")));
        graphics.setFont(customFont);

Here is the code I have: also given in the original question: And my test PPTX contains this word - /ˌinəˈvāSHən/ in addition to other English letter words.

package foo;

import java.awt.Dimension; 
import java.awt.Graphics2D;
import java.awt.geom.AffineTransform;
import java.awt.geom.Rectangle2D;
import java.awt.image.BufferedImage;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import org.apache.poi.xslf.usermodel.XMLSlideShow;
import org.apache.poi.xslf.usermodel.XSLFSlide;

public class PPTXToPNG {

public static void main(String[] args) throws Exception {

    FileInputStream is = new FileInputStream("C:/Temp/PPTXToImage/unicode_test.pptx");      

    XMLSlideShow ppt = new XMLSlideShow(is);
    is.close();
    double zoom = 2;
    AffineTransform at = new AffineTransform();
    at.setToScale(zoom, zoom);
    Dimension pgsize = ppt.getPageSize();
    XSLFSlide[] slide = ppt.getSlides();

    BufferedImage img = new BufferedImage((int)Math.ceil(pgsize.width*zoom),
            (int)Math.ceil(pgsize.height*zoom), BufferedImage.TYPE_INT_RGB);
    Graphics2D graphics = img.createGraphics();

    graphics.setTransform(at);
    graphics.fill(new Rectangle2D.Float(0, 0, pgsize.width, pgsize.height));

    // Draw first page in the PPTX. First page starts at 0 position
    slide[0].draw(graphics);

    FileOutputStream out = new FileOutputStream("C:/Temp/PPTXToImage/ConvertedSlide.png");  
    javax.imageio.ImageIO.write(img, "png", out);
    out.close();
    System.out.println("DONE");

   }
}
Peter O.
  • 32,158
  • 14
  • 82
  • 96
user2565431
  • 49
  • 2
  • 10
  • Have you tried to debug it, i.e. set the breakpoint in `org.apache.poi.xslf.usermodel.TextFragment.draw()`? Check the AttributedCharacterIterator, if it really contains a reference to your font - sometimes the Font object is cloned and the fontfamily suddenly switches back to the "Dialog" type – kiwiwings Aug 28 '13 at 22:47
  • "a junk character" -- *any* junk, or the default "Not Available" character for this font? The latter would indicate that this font does not contained the requested characters. *Random* junk, on the other hand, indicates your workflow does not support the necessary conversions from one character encoding to another (UTF8? Unicode?) – Jongware Aug 28 '13 at 23:02

1 Answers1

2

As Jongware pointed out above, the characters are not available in the "GE Inspira" font, as you can see in the example programm below - so you'll need some /ˌinəˈvāSHən/ (innovation) to come around that ;)

There are several approaches I can think off:

  • I'm not sure if this graphics.setFont(customFont); for setting the in your code was just a test, but normally POI will use (and set) the font, which was specified in the document. So the easiest would be to replace the font in the original document with a font which supports phonetics (see the wikipedia unicode article for suitable fonts). Btw. if you try to use that font in Libre Office and insert these phonetics you'll also get "junk" chars.

  • you could use something like fontforge to add the missing chars to your preferred font from a different font (but of course it need to be used - see above). It would look a bit strange, but better than rectangles ...

  • you could check beforehand if certain chars in the text-runs are supported for the specified font and insert a new text-run element with an alternative font for the unsupported chars

  • I know that PDFs have some kind of font substitution going on, in case a font (or even a character???) can't be found, I haven't found a similar mechanism for java in a short search ... maybe there's also a solution in this way ...

(tested with POI 3.10-beta1)

import java.awt.*;
import java.awt.geom.*;
import java.awt.image.BufferedImage;
import java.io.*;
import org.apache.poi.xslf.usermodel.*;

public class UnicodePPT {
    public static void main(String[] args) throws Exception {
        // create a sample pptx
        XMLSlideShow ss = new XMLSlideShow();
        Dimension pgsize = ss.getPageSize();             

        XSLFSlide slide = ss.createSlide();
        XSLFTextBox tb = slide.createTextBox();
        tb.setShapeType(XSLFShapeType.HEART);
        int shapeSize = 150;
        tb.setAnchor(new Rectangle2D.Double(pgsize.getWidth()/2-shapeSize/2, pgsize.getHeight()/2-shapeSize/2, shapeSize, shapeSize));
        tb.setLineWidth(2);
        tb.setLineColor(Color.BLACK);
        XSLFTextParagraph par = tb.addNewTextParagraph();
        tb.setVerticalAlignment(VerticalAlignment.DISTRIBUTED);
        par.setTextAlign(TextAlign.CENTER);
        XSLFTextRun run = par.addNewTextRun();
        run.setText("/\u02CCin\u0259\u02C8v\u0101SH\u0259n/");
        run.setFontFamily("DejaVu Serif");
        run.setFontSize(12);
        par.addLineBreak();
        run = par.addNewTextRun();
        run.setText("/\u02CCin\u0259\u02C8v\u0101SH\u0259n/");
        run.setFontFamily("GE Inspira");
        run.setFontSize(12);

        // set the font
        GraphicsEnvironment ge = GraphicsEnvironment.getLocalGraphicsEnvironment();
        InputStream is = new FileInputStream("src/main/resources/GEInspRg.TTF");
        Font font = Font.createFont(Font.TRUETYPE_FONT, is);
        is.close();
        ge.registerFont(font);  

        is = new FileInputStream("src/main/resources/DejaVuSerif.ttf");
        font = Font.createFont(Font.TRUETYPE_FONT, is);
        is.close();
        ge.registerFont(font);  

        // render it
        double zoom = 2; // magnify it by 2
        AffineTransform at = new AffineTransform();
        at.setToScale(zoom, zoom);

        BufferedImage img = new BufferedImage((int)Math.ceil(pgsize.width*zoom), (int)Math.ceil(pgsize.height*zoom), BufferedImage.TYPE_INT_RGB);
        Graphics2D graphics = img.createGraphics();
        graphics.setTransform(at);                
        graphics.setPaint(Color.white);
        graphics.fill(new Rectangle2D.Float(0, 0, pgsize.width, pgsize.height));
        slide.draw(graphics);             

        FileOutputStream fos = new FileOutputStream("unicodeppt.png");
        javax.imageio.ImageIO.write(img, "png", fos);       
        fos.close();
    }
}
Community
  • 1
  • 1
kiwiwings
  • 3,386
  • 1
  • 21
  • 57
  • The issues seems to be with POI not able to convert unicode charcaters properly. It is converting unicode characters to square box. So there is an issue with encoding used by POI. Don't see anything related to fonts here, which you have specified in above example. I don't think POI is using UTF-8 encoding otherwise those unicode characters would have converted correctly. Any thought ? – user2565431 Sep 01 '13 at 04:14
  • I've written that example code to test if it has to do something with UTF-8 encoding! You'll see one line with "junk" chars, which uses the GE Inspira font. The next line uses a font which contains the phonetics and is displayed right. There might be a problem with asian characters as pointed out by Shervin in your [dupicated question](http://stackoverflow.com/questions/18551722/what-encoding-poi-supports) but not with low range chars as phonetics. Have you actually tried to get the DejaVu Sans font and the example above??? – kiwiwings Sep 01 '13 at 05:14
  • kiwiwings - GREAT. So your above program worked. I see first line converted properly. So how this will help me. In my situation I have a program which reads pptx files from filers and convert them to image one page at a time. Those PPTX files may contain unicode or chinese or arabic or any other international letters. When converted to image they should remain in their respective letters and not junk or square. Let me tell you one more thing - I have all these fonts in one directory. When I used aspose I was able to specify fonts directory but don't see that how to do that in case of poi. – user2565431 Sep 01 '13 at 15:32
  • Please note in my case i won't be creating new pptx from the code but I will be reading the already created pptx from the disk and then converting them to png. – user2565431 Sep 02 '13 at 03:10
  • 1) You can register all the fonts, so POI or better the java graphics renderer finds it. 2) As you haven't provided a test-file, I had to create my own to show you that's not a Unicode but a font problem of GE-Inspira. 3) Of course you can open and modify a file so the third approach where you modify the pptx, i.e. replace the font of chars which are not supported by this font would be a workaround. If you can't program such a workaround in the meantime, I'll have a look after my holidays into it (after 9.9.) ... – kiwiwings Sep 02 '13 at 15:39
  • I changed the fonts in my test pptx to Arial from GE Inspara and set the fonts in code to Arial and it worked. So you are correct. Some issue with GE Inspara. So that means I need to register all the fonts which could be 40-50. That would be too many lines in the code. In Aspose API there is just one line to register all the fonts from a directory - FontSettings.setFontsFolder. Don't see that in POI. Which means I may need to load all these fonts somehow instead of writing a single lines to register all 40-50 fonts in the code. – user2565431 Sep 03 '13 at 16:23
  • So "kiwiwings" - I thank very very much for all your help. You have been great. You gave me the solution. It was a fonts issue. "GE Inspira" font did not contain those characters - like reversed E which is "U+0258 - Latin small letter reversed E". I confirmed that by trying to find U+0258 - Latin small letter reversed E in Windows Character Map and it did not find that reverse E when I select "GE Inspira" font in character Map. You don't know how much you have helped me. This is HUGE HUGE for me. My project would not have gone to production without justifying this. – user2565431 Sep 09 '13 at 18:03