I am trying to convert pptx files to txt (Text Extraction) using Apache POI Framework (Java). I'm new in coding Java, so I don't know a lot about Buffered Readers/InputStream, etc.
What I tried is:
import org.apache.poi.xslf.XSLFSlideShow;
import org.apache.poi.xslf.extractor.XSLFPowerPointExtractor;
import org.apache.poi.xslf.usermodel.XMLSlideShow;
... Classes and Stuff ....
String inputfile = "X:\\Master\\simpl_temp\\2d0a44a2-95e7-428c-911c-1f803acbff42.pptx";
InputStream fis = new FileInputStream(inputfile);
BufferedReader br1 = new BufferedReader(new InputStreamReader(fis));
String fileName = br1.readLine();
System.out.println(new XSLFPowerPointExtractor(new XMLSlideShow(new XSLFSlideShow(fileName))).getText());
br1.close();
My goal is, to write the extracted text into a variable, but It doesn't even work to print it on console... What I get is:
org.apache.poi.openxml4j.exceptions.InvalidOperationException: Can't open the specified file: 'PK
org.apache.poi.openxml4j.opc.ZipPackage.<init>(ZipPackage.java:102)
org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:199)
org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:178)
org.apache.poi.POIXMLDocument.openPackage(POIXMLDocument.java:69)
org.apache.poi.xslf.XSLFSlideShow.<init>(XSLFSlideShow.java:90)
Any help would be greatly appreciated!