OfficeJS Extracting all text from every slide of a PowerPoint document

Question

We want to extract all text of an opened PowerPoint document from an add-in.

In Word, we do the following:

Word.run( context => {
  var paragraphs = context.document.body.paragraphs;
  context.load(paragraphs, 'text');
  return context.sync().then( () => {
    var items = paragraphs.items.;
    // Do something with items
  }
});

We're only interested in the text of the PowerPoint document.

We haven't found much documentation on how to use the API for PowerPoint specifically (this is all we found). This sample project comes close to what we want in that it extracts data from the document but we were hoping to do it without loading the compressed file and parsing the text from the file chunks.

How might we best approach this?

score 1 · Accepted Answer · answered Nov 06 '17 at 14:36

1

I'm afraid this isn't possible. The Office.js reference documentation can filter out everything except the PowerPoint APIs but the functionality is somewhat primitive. In this case, there isn't an API that allows you iterate through the objects in the document like you can in Word or Excel.

As you found in that sample, you can use the Document.getFileAsync method to retrieve the raw OOXML. Parsing OOXML isn't quite as painful as it seems at first (it's just XML). The large challenge is once you have the OOXML, there isn't a way to push changes you make to it back into PowerPoint. It is effectively a read-only operation.

I would strongly suggest visiting the Office Dev UserVoice site and adding your suggestions. The UserVoice is regularly reviewed by the product teams and is the best method of ensuring PowerPoint is made aware of the limitations you're running into with the API.

answered Nov 06 '17 at 14:36

Marc LaFleur

31,987
4
37
63

Thanks for the reply! I am struggling to get the binary compressed XML into a string XML from which I can parse the slide contents. I am using the following approach to get the data: https://dev.office.com/reference/add-ins/shared/document.getfileasync#example---get-a-document-in-office-open-xml-compressed-format When logging the fileContent variable at the end it's still in bytes and I haven't managed to parse it back into OOXML. Is this due to the compression? – IronLionZion Nov 07 '17 at 20:40
1

When you tell it to return the file using `Compressed` it is sending you a `.pptx` file which is actually `.zip` file containing the OOXML and other assorted assets. You need to unpack the file before you can access the raw content. – Marc LaFleur Nov 07 '17 at 20:48
That makes sense. Thanks again. Also, I'd make a new thread for this might perhaps it's more useful if you update the answers on the old threads: How can we find out the host (Word, Excel, ...) that the add-in is being run on? I see many references to "the new API" on StackOverflow but no apparent updates since its release (and the relevant linked GitHub pages no longer exist). From scanning the docs it seems like this is the proper way to check for Word 2016: Office.context.requirements.isSetSupported('WordApi') Could you please confirm that or point me in the right direction if I'm mistaken? – IronLionZion Nov 07 '17 at 21:24
I found this info from here: https://dev.office.com/docs/add-ins/overview/specify-office-hosts-and-api-requirements#runtime-checks-using-methods-not-in-a-requirement-set But it's honestly a bit confusing because you have Word 2016, Word 2013, Word Online, etc. and it seems we need to use something different for older versions (e.g. Word 2013). Is this the same method as referred to in the following post? https://stackoverflow.com/questions/37785156/how-to-find-the-office-addin-host-it-is-a-word-application-or-excel-using-office I also can't figure out how to check if the host is PowerPoint. – IronLionZion Nov 07 '17 at 21:39
3

Regarding unpacking in javascript, I solved that for a Word document [here](https://stackoverflow.com/questions/30183285/office-task-pane-app-how-to-get-the-whole-document-in-an-ooxml-string/38179837#38179837). Maybe you can use it for something. – jkh Nov 08 '17 at 06:45
1

@jkh That's exactly what I need! Thanks! (Now to get zip.js code reviewed and approved for prod...) – IronLionZion Nov 09 '17 at 22:48
As suggested by @Marc, I took the liberty and opened a new request [here](https://powerpoint.uservoice.com/forums/288955-powerpoint-for-android/suggestions/33842533-powerpoint-js-add-in-allow-extraction-of-all-the-o) – raks Apr 04 '18 at 01:48

OfficeJS Extracting all text from every slide of a PowerPoint document

1 Answers1