I am using XSLFPowerPointExtractor to extract text from a pptx file. However all the text in the pptx file is returned to me in a single string. Is there anyway i can get the text on each slide separately? I am completely new to this concept, so please give detailed answers..
Asked
Active
Viewed 1,692 times
-3
-
Have you looked into apache poi? – David Brossard Dec 16 '14 at 11:57
-
yes, powerpointextractor is a class of the POI package. it just gives me the getText() option which returns the entire content of the file as a string. – confused_coder Dec 16 '14 at 12:33
-
Did you look at the format of the returned string? I would assume the slides would be delimited somehow, and you could split the string on the delimiter. – forgivenson Dec 16 '14 at 12:59
-
i did, there is no way to tell one slide apart from the other. its all one long string. – confused_coder Dec 16 '14 at 13:29
1 Answers
0
I looked up the API documentation and it seems that it's either all or nothing. The API documentation has a method called getText() which returns the entire text for all the slides which is exactly the behavior you are observing.
A bit more googling showed me that the way to do it is to use another API namely XMLSlideShow. That gives you a slide-by-slide access to the presentation.
From there, you can access the different shapes including the text areas from which you can read the text. As a matter of fact, this is explained in this other SO question which I believe will help you resolve your issue: How to get pptx slide notes text using apache poi?

Community
- 1
- 1

David Brossard
- 13,584
- 6
- 55
- 88