2

I am currently trying to get the page count of a Word document in openXML format and have been able to get to the point of where I have the XML structure of the document in a readable format, but I can't seem to find where the page count property is. Any guidance would be appreciated.|

UPDATE: You can access the page count and other metadata by accessing the docProps/app.xml file. All you have to do is separate and extract the data you want. I got the page count by doing this.

const XMLData = fs.readFileSync(data, { encoding: "utf-8" });

  let pageCount = XMLData.split("<Pages>")
    .join(",")
    .split("</Pages>")
    .join(",")
    .split(",")[1];`
const fs = require("fs");
const path = require("path");
const axios = require("axios");

let noRepeatDocs = ['somewebsite.com/somedocument.docx'];


const writeTheFile = async (data) => {
  fs.writeFileSync("read_word_doc", data);
};

const unzipTheFile = async (data) => {
  fs.createReadStream(data)
    .pipe(unzipper.Parse())
    .on("entry", function (entry) {
      const fileName = entry.path;
      const type = entry.type;
      const size = entry.vars.uncompressedSize;

        if (fileName === "word/document.xml") {
            entry.pipe(fs.createWriteStream("./output"));
      } else {
        entry.autodrain();
      }
    });
};

const getWordBuffer = async (arr) => {
  for (const wordDocLink of arr) {
    const response = await axios({
      url: wordDocLink,
      method: "GET",
      responseType: "arraybuffer",
    });
    const data = response.data;
    await writeTheFile(data);
    await unzipTheFile("./read_word_doc"); 
  }
};

getWordBuffer(noRepeatDocs);
Gavin Coulson
  • 135
  • 11
  • 1
    You cannot get a reliable page count from a static Word document because. See [Can't get the pages count from a word document with OpenXml](https://stackoverflow.com/q/64820951/290085) and [How to access OpenXML content by page number?](https://stackoverflow.com/q/39992870/290085) for further details. – kjhughes Dec 21 '21 at 20:22
  • Not really, they're using .net and the OpenXML SDK there. I want to read over the XML data in NodeJS and get the page count from the XML data. Thanks anyway! – Gavin Coulson Dec 21 '21 at 20:24
  • 1
    It doesn't matter whether you're using .NET or NodeJS, the information you seek isn't stored statically in any reliable way in a DOCX file. See the links in my previous comment, especially the second one. – kjhughes Dec 21 '21 at 20:25
  • 1
    Read, more thanks! Super helpful. – Gavin Coulson Dec 21 '21 at 20:27

0 Answers0