0

Every week, I'll be receiving a set of pdf files from my clients.

They will paste the pdf files in the specific google drive folder. I need a total number of pages of the pdf file. I was trying to create a code in Apps script which will helps to update the pdf file name and the total number of pages in the particular Google sheet.

I found the code which was created for the google docs here and here.

But that doesn't work. I am looking for a Apps script which helps to check the particular drive folder and update the pdf file name and the total number of pages in the specific google sheet.

I have tried to below script.

function getNumberofPages() {
  var myFolder = DriveApp.getFoldersByName("Test").next();
  var files = myFolder.searchFiles('title contains ".PDF"');
  while (files.hasNext()) {
    var file = files.next();
    Logger.log(file.getName());
    Logger.log(file.length);
  }
}

But the length option is not working of pdf file....

Thanks in advance.

Tanaike
  • 181,128
  • 11
  • 97
  • 165
  • PDFs are blob content, which means you will need to interpret the bytes of the target file. The number of pages is a function of the PDF's layout - not something that Google did or has a function for. – tehhowch Jan 03 '19 at 14:20
  • @ashokkumar ponnnusamy I apologize I couldn't help you. – Tanaike Jan 05 '19 at 22:59

2 Answers2

4

Unfortunately, there are no methods for directly retrieving the total pages from a PDF file using Google APIs yet. So how about these workarounds? Please choose it for your situation.

Workaround 1:

In this workaround, it retrieves the number of content streams in the PDF file. The content streams is shown as the attribute of /Contents.

When this is reflected to your script, it becomes as follows.

Modified script:

function getNumberofPages() {
  var myFolder = DriveApp.getFoldersByName("Test").next();
  var files = myFolder.searchFiles('title contains ".PDF"');
  while (files.hasNext()) {
    var file = files.next();
    var n = file.getBlob().getDataAsString().split("/Contents").length - 1;
    Logger.log("fileName: %s, totalPages: %s", file.getName(), n)
  }
}
  • Although this workaround is simple, it might be able to not use for all PDF files as @mkl says. If this workaround cannot be used for your PDF files, how about the following workaround 2?

Workaround 2:

In this workaround, an API is used for retrieving the total pages of PDF file. I used Split PDF API. The total pages are retrieved from the number of splitted files. When you use this API, please check ConvertAPI and retrieve your secret key.

Modified script:

function getNumberofPages() {
  var myFolder = DriveApp.getFoldersByName("Test").next();
  var files = myFolder.searchFiles('title contains ".PDF"');
  while (files.hasNext()) {
    var file = files.next();
    var url = "https://v2.convertapi.com/convert/pdf/to/split?Secret=#####"; // Please set your secret key.
    var options = {
      method: "post",
      payload: {File: DriveApp.getFileById(file.getId()).getBlob()},
    }
    var res = UrlFetchApp.fetch(url, options);
    res = JSON.parse(res.getContentText());
    Logger.log("fileName: %s, totalPages: %s", file.getName(), res.Files.length)
  }
}
  • I'm not sure about the number of PDF files and file size. So I didn't use fetchAll method for this. This is a sample script. So please modify this for your situation.

Note:

  • I can use these workarounds in my applications. But I have not been able to confirm for all PDF files. So if these workarounds didn't work for your PDF files, I'm sorry.

Reference:

Workaround 3:

As another approach, when this method is used, the sample script for retrieving the number of pages of PDF data is as follows.

async function myFunction() {
  const cdnjs = "https://cdn.jsdelivr.net/npm/pdf-lib/dist/pdf-lib.min.js";
  eval(UrlFetchApp.fetch(cdnjs).getContentText()); // Load pdf-lib
  const setTimeout = function (f, t) {
    // Overwrite setTimeout with Google Apps Script.
    Utilities.sleep(t);
    return f();
  };

  const myFolder = DriveApp.getFoldersByName("Test").next();
  const files = myFolder.searchFiles('title contains ".PDF"');
  const ar = [];
  while (files.hasNext()) {
    ar.push(files.next())
  }
  for (let i = 0; i < ar.length; i++) {
    const file = ar[i];
    const pdfData = await PDFLib.PDFDocument.load(new Uint8Array(file.getBlob().getBytes()));
    const n = pdfData.getPageCount();
    console.log("fileName: %s, totalPages: %s", file.getName(), n);
  }
}

Note:

Tanaike
  • 181,128
  • 11
  • 97
  • 165
  • There may be more or less pages in a pdf than there are occurrences of `/Contents` in it. As the op does not control the upload by the clients (at least he does not indicate otherwise), he does not know whether the pdfs his clients upload are generated similarly to yours. Thus, your method might return inaccurate results in his case. – mkl Jan 04 '19 at 06:01
  • @mkl Thank you for your advice. Yes. So I added a Note section and proposed a simple workaround for Google Apps Script that I'm also using. But from your advice, I added one more workaround. It uses an API. By this, OP can select the workaround for OP's situation. How do you think about this? – Tanaike Jan 04 '19 at 08:07
  • Chances are that that second workaround is more accurate but you rely on an external api and have to consider the case of the api not being available. Oh well... – mkl Jan 04 '19 at 08:57
  • @mkl Thank you for replying. I apologize for my poor skill. I would like to study more. – Tanaike Jan 04 '19 at 11:48
  • No need to apologize for anything. External apis simply have this disadvantage of probably not being available all the time. Thus, using them means that one's own code also has a limited availability only. This is not necessarily a KO but one has to be aware of it and tell one's clients. – mkl Jan 04 '19 at 13:36
  • @mkl Thank you for the additional advice. I could obtain the important information from them. – Tanaike Jan 04 '19 at 23:29
  • Thank you so much @Tanaike and mkl. I have just created a script. Tanaike, I really appreciate your help. This (file.getBlob().getDataAsString().split("/Contents").length - 1;) is the trum card for my script. Once again, Thank you. – ashokkumar ponnnusamy Jan 05 '19 at 00:04
  • @ashokkumar ponnnusamy Thank you for your response. When you use the workaround 1, please sometimes check whether the retrieved total-pages are correct. There might be the case that the content streams cannot be used as mkl says. I worry about this. – Tanaike Jan 05 '19 at 00:28
  • @Tanaike Sure. Thank you. – ashokkumar ponnnusamy Jan 05 '19 at 00:44
-2
function menuItem() {
var folder = 
DriveApp.getFoldersByName('Test').next();
var contents = folder.searchFiles('title contains ".PDF"');
var file;
var name;
var sheet = SpreadsheetApp.getActiveSheet();
var count;

sheet.clear();
sheet.appendRow(["Name", "Number of pages"]);

while(contents.hasNext()) {
file = contents.next();
name = file.getName();
count = 
file.getBlob().getDataAsString().split("/Contents").length - 1;

data = [name, count]
sheet.appendRow(data);
}
};


function onOpen() {
var ui = SpreadsheetApp.getUi();
ui.createMenu('PDF Page Calculator')
          .addItem("PDF Page Calculator", 
'menuItem')
.addToUi();
};