Is there a way to slice a Google doc into multiple PDFs?

Question

I would like to replicate in Google Scripts VBA code that I wrote for Word docs. Basically, it "slices" a document into multiple PDF files by searching for tags that I insert into the document. The purpose is to allow choirs using forScore (an app that manages musical scores) to insert previously annotated musical scores at the slice points. That way, during a performance they can page through a musical program in its entirety from start to finish.

This is what I have so far. I know it's sketchy, and it doesn't actually work as is:

enter code here
// Loop through body elements searching for the slice tags
// Slice tag example: #LCH-Introit (Seven Seasonal-God is Ascended)#
// When a slice tag is found a document is created with a subsection of the document

// NOT SURE USING RANGES IS WHAT'S NEEDED
rangeBuilder = doc.newRange(); 
for (i = 0; i < body.getNumChildren(); i++) { 
  child = body.getChild(i);
  rangeBuilder.addElement(child);
  // Logger.log(fName + ": element(" + i + ") type = " + child.getType());
  if (child.getType() === DocumentApp.ElementType.PARAGRAPH) {
    foundElement = child.findText('#LCH-');
    if (foundElement) {
      txt = foundElement.getElement().getParent().editAsText().getText();
      // Generate PDF file name
      count++;
      padNum = padDigits(count, 2); // Function I wrote to pad zeroes
      pdfName = serviceName + '-' + padNum + ' ' + txt.substr(txt.indexOf('-') + 1, (txt.length - txt.indexOf('-') - 2));
      Logger.log(fName + ": pdfName = " + pdfName);
      // Create new doc and populate with current Elements
      // Here's where I'm stuck.
      // There doesn't seem to be a way to reference page numbers in Google Scripts. 
      // That would make things much easier.
      // docTemp = DocumentApp.create(pdfName);
    }
  }
}

Can you explain about your script? Because from your script, I couldn't understand about `it doesn't actually work`. — Tanaike, Aug 07 '19 at 23:17
Sorry, that wasn't very clear. This is just a code snippet . For example, I left out the 'doc' and 'body' variable definitions and how they derived their values. (I wasn't sure how much detail was required). As it stands, the name of the eventual file (pdfName) has been generated but that's it. Would it be better if I re-did the code? — sbaptista, Aug 08 '19 at 07:27
If I understood correctly, does each new PDF have only the paragraph where the word was found? — Jescanellas, Aug 08 '19 at 09:51
Thank you for replying. Unfortunately, from your script, I cannot still understand about the output and input you want. I deeply apologize for my poor English skill. — Tanaike, Aug 08 '19 at 11:39
OK, I've really done a poor job at explaining what I'd like to do. Too many details are left out to make it clear. A problem for me is that I'm not sure what Google Script code would work in this situation. @Jescanellas: Each PDF slice would have all the pages in the document since the last slice. So let's say I find the first slice tag on page 5. I would create a PDF of pages 1-5. If the next slice was found at page 9, then I would create a PDF of pages 6-9, and so on. Perhaps it would be best to provide pseudo code and ask how it could be implemented in Google Scripts? — sbaptista, Aug 08 '19 at 19:28
Thank you for replying. I noticed that an answer has already been posted. I think that it will resolve your issue — Tanaike, Aug 09 '19 at 23:56

score 1 · Accepted Answer · answered Aug 09 '19 at 10:56

I think I have the solution for this. The next script will search for a certain word in the text, and every time it finds it it will create a new Doc in your Drive account with the text between each word. Keep in mind that this word is also added to the text, so if you want to remove it, I will update the answer with the code for that.

As I wasn't sure if you wanted to download the PDFs or keep them in Drive, it will create links to download those files in PDF format. You can find them in View > Logs after the execution finished.

function main() {

  var doc = DocumentApp.getActiveDocument();
  var body = doc.getBody();
  var listItems = body.getListItems();
  var docsID = []

  var content = [];

  for (var i = 0; i < listItems.length; i++){ //Iterates over all the items of the file

    content[i] = listItems[i].getText(); //Saves the text into the array we will use to fill the new doc

    if (listItems[i].findText("TextToFind")){
      createDoc(content, i, docsID);
      content = []; //We empty the array to start over
    }

  }
  linksToPDF(docsID);

}


function createDoc(content, index, docsID){

  var newdoc = DocumentApp.create("Doc number " + index); //Creates a new doc for every segment of text
  var id = newdoc.getId();
  docsID.push(id);

  var newbody = DocumentApp.openById(id).getBody();


  for (i in content){
    newbody.appendListItem(content[i]); //Fills the doc with text
  } 

}

function linksToPDF(docsID){ //
  Logger.log("PDF Links: ");
  for (i in docsID){

    var file = DriveApp.getFileById(docsID[i]);

    Logger.log("https://docs.google.com/document/d/" + docsID[i] + "/export?format=pdf&");
  } 
}

In case you have too many files and want to download the PDFs automatically, you should deploy the script as a WebApp. If you need help with that I will update my code too.

Thanks for your reply. For the document I use to test, there doesn't appear to be any ListItems, (listItems.length == 0). I've verified that there is content in the document, (body.getNumChildren() = 167). Of the children, all are of type PARAGRAH except two UNSUPPORTED. — sbaptista, Aug 10 '19 at 17:03
I was able to create PDF files but they are basically text strings not the document with all of its components up to the search string. What I'm looking for is somehow being able to determine what page the search string is on and then create a PDF from that. So if strings were found on page 4, 9, and 20, I want to create a PDF file of pages 1-4, 5-9, and 10-20. I'm now wondering if it's better to export a PDF file of the entire document -- which I know how to do -- and then find a javascript solution for searching the PDF file instead. — sbaptista, Aug 11 '19 at 03:37
I see, the problem is the Docs API doesn't work with page numbers, in the documentation it appears as an [UnsupportedElement Class](https://developers.google.com/apps-script/reference/document/unsupported-element) which means it can't be affected or returned by any script. The good news are `Paragraph` has the same methods as `ListItem`, so in my previous post, if you replace the word `ListItem` with `Paragraph`, it still works. — Jescanellas, Aug 12 '19 at 07:07

Is there a way to slice a Google doc into multiple PDFs?

1 Answers1

Linked