0

I'm trying to combine several Google Document inside one, but images inside the originals documents are inserted twice. One is at the right location, the other one is at the end of the newly created doc.

From what I saw, these images are detected as Paragraph by the script.

As you might see in my code below, I've been inspired by similar topics found here. One of them suggested searching for child Element inside the Paragraph Element, but debugging showed that there is none. The concerned part of the doc will always be inserted with appendParagraph method as the script is not able to properly detect the image.

This is why the other relevant topic I found cannot work here : it suggested inserting the image before the paragraph itself but it cannot detects it.

Logging with both default Logger and console.log from Stackdriver will display an object typed as Paragraph. The execution step by step did not show displayed any loop calling the appendParagraph method twice.

/* chosenParts contains list of Google Documents name */
function concatChosenFiles(chosenParts) {
  var folders = DriveApp.getFoldersByName(folderName);
  var folder = folders.hasNext() ? folders.next() : false;
  var parentFolders = folder.getParents();
  var parentFolder = parentFolders.next();
  var file = null;
  var gdocFile = null;
  var fileContent = null;
  var offerTitle = "New offer";
  var gdocOffer = DocumentApp.create(offerTitle); 
  var gfileOffer = DriveApp.getFileById(gdocOffer.getId()); // transform Doc into File in order to choose its path with DriveApp
  var offerHeader = gdocOffer.addHeader();
  var offerContent = gdocOffer.getBody();
  var header = null;
  var headerSubPart = null;
  var partBody= null;
  var style = {};

  parentFolder.addFile(gfileOffer); // place current offer inside generator folder
  DriveApp.getRootFolder().removeFile(gfileOffer); // remove from home folder to avoid copy

  for (var i = 0; i < chosenParts.length; i++) {
    // First retrieve Document to combine
    file = folder.getFilesByName(chosenParts[i]);
    file = file.hasNext() ? file.next() : null;
    gdocFile = DocumentApp.openById(file.getId());

    header = gdocFile.getHeader();
    // set Header from first doc
    if ((0 === i) && (null !== header)) {
      for (var j = 0; j < header.getNumChildren(); j++) {
        headerSubPart = header.getChild(j).copy();
        offerHeader.appendParagraph(headerSubPart); // Assume header content is always a paragraph
      }
    }

    fileContent = gdocFile.getBody();

    // Analyse file content and insert each part inside the offer with the right method
    for (var j = 0; j < fileContent.getNumChildren(); j++) {

      // There is a limit somewhere between 50-100 unsaved changed where the script
      // wont continue until a batch is commited.
      if (j % 50 == 0) {
        gdocOffer.saveAndClose();
        gdocOffer = DocumentApp.openById(gdocOffer.getId());
        offerContent = gdocOffer.getBody();
      }

      partBody = fileContent.getChild(j).copy();     
      switch (partBody.getType()) {
        case DocumentApp.ElementType.HORIZONTAL_RULE:
          offerContent.appendHorizontalRule();
          break;
        case DocumentApp.ElementType.INLINE_IMAGE:
          offerContent.appendImage(partBody);
          break;
        case DocumentApp.ElementType.LIST_ITEM:
          offerContent.appendListItem(partBody);
          break;
        case DocumentApp.ElementType.PAGE_BREAK:
          offerContent.appendPageBreak(partBody);
          break;
        case DocumentApp.ElementType.PARAGRAPH:
          // Search for image inside parapraph type
          if (partBody.asParagraph().getNumChildren() != 0 && partBody.asParagraph().getChild(0).getType() == DocumentApp.ElementType.INLINE_IMAGE) 
          {
            offerContent.appendImage(partBody.asParagraph().getChild(0).asInlineImage().getBlob());
          } else {
            offerContent.appendParagraph(partBody.asParagraph());
          }
          break;
        case DocumentApp.ElementType.TABLE:
          offerContent.appendTable(partBody);
          break;
        default:
          style[DocumentApp.Attribute.BOLD] = true;
          offerContent.appendParagraph("Element type '" + partBody.getType() + "' from '" + file.getName() + "' could not be merged.").setAttributes(style);
          console.log("Element type '" + partBody.getType() + "' from '" + file.getName() + "' could not be merged.");
          Logger.log("Element type '" + partBody.getType() + "' from '" + file.getName() + "' could not be merged.");
      }
    }
    // page break at the end of each part.
    offerContent.appendPageBreak();
  }
}

The problem occurs no matter how much files are combined, using one is enough to reproduce.

If there's only one image in the file (no spaces nor line feed around) and if the "appendPageBreak" is not used afterward, it will not occur. When some text resides next to the image, then the image is duplicated.

One last thing : Someone suggested that it is "due to natural inheritance of formatting", but I did not find how to prevent that.

Many thanks to everyone who'll be able to take a look at this :)

Edit : I adapted the paragraph section after @ziganotschka suggestions

It is very similar to this subject except its solution does not work here.

Here is the new piece of code :


        case DocumentApp.ElementType.PARAGRAPH:
          // Search for image inside parapraph type
          if(partBody.asParagraph().getPositionedImages().length) {
            // Assume only one image per paragraph (@TODO : to improve)
            tmpImage = partBody.asParagraph().getPositionedImages()[0].getBlob().copyBlob();
            // remove image from paragraph in order to add only the paragraph
            partBody.asParagraph().removePositionedImage(partBody.asParagraph().getPositionedImages()[0].getId());
            tmpParagraph = offerContent.appendParagraph(partBody.asParagraph());
            // Then add the image afterward, without text
            tmpParagraph.addPositionedImage(tmpImage);

          } else if (partBody.asParagraph().getNumChildren() != 0 && partBody.asParagraph().getChild(0).getType() == DocumentApp.ElementType.INLINE_IMAGE) {
            offerContent.appendImage(partBody.asParagraph().getChild(0).asInlineImage().getBlob());
          } else {
            offerContent.appendParagraph(partBody.asParagraph());
          }
          break;

Unfortunately, it stills duplicate the image. And if I comment the line inserting the image (tmpParagraph.addPositionedImage(tmpImage);) then no image is inserted at all.

Edit 2 : it is a known bug in Google App Script

https://issuetracker.google.com/issues/36763970

See comments for some workaround.

vrcAlbert
  • 1
  • 2
  • I reproduced your code and did not encounter the behavior you describe. Can you provide a sample document which would allow to reproduce the issue? – ziganotschka Aug 26 '19 at 13:59
  • Thanks for your time ; I'm surprised you could not reproduce :-/ Here's a link to a file which triggers the wrong behaviour : https://docs.google.com/document/d/1-RZ-rfxV1oG9AVNlDSsW7XQxAed4JDlaPEXINeQL6Qs/edit?usp=sharing – vrcAlbert Aug 27 '19 at 13:01

2 Answers2

0

Your image is embedded as a 'Wrap text', rather than an Inline image

This is why you cannot retrieve it with getBody().getImages();

Instead, you can retrieve it with getBody().getParagraphs();[index].getPositionedImages()

I am not sure why exactly your image is copied twice, but as a workaround you can make a copy of the image and insert it as an inline image with

getBody().insertImage(childIndex, getBody().getParagraphs()[index].getPositionedImages()[index].copy());

And subsequently

getBody().getParagraphs()[index].getPositionedImages()[index].removeFromParent();

Obviously, you will need to loop through all the paragraphs and check for each one either it has embedded positioned images in order to retrieve them with the right index and proceed.

ziganotschka
  • 25,866
  • 2
  • 16
  • 33
  • Thanks again for your time :) With the adjustments you suggest, the image is not duplicated but added as `InlineImage` (which is the right behaviour for `insertImage`). But an `InlineImage` changes the format I had in the original document and I wish to keep it. Nevertheless, you are right about the image being a "positoned image" so I tried to work with it. Unfortunetaly the problem still occurs. I will adapt my original post to take this into account. – vrcAlbert Aug 29 '19 at 13:38
  • Unfortunately I do not know what is the problem with positioned images. It might be a bug, so I suggest you to file it on public issue tracker https://issuetracker.google.com. In the mean time, you need to use a workaround. If converting positioned images to inline images is not a suitable workaround for you, you might think about converting your docs to pdf before merging (you can convert them back to a Docs file once merged, with OCR). – ziganotschka Aug 29 '19 at 14:15
  • 1
    Thanks for sharing the issuetracker, I was not aware of its existence. I found the bug referenced in it, not sure they will do something about it : https://issuetracker.google.com/issues/36763970 Anyway, thanks for your time and suggestions ! – vrcAlbert Aug 30 '19 at 13:16
0

Add your PositionedImages at the end of your script after you add all your other elements. From my experience if other elements get added to the document after the the image positioning paragraph, extra images will be added.

You can accomplish this my storing a reference to the paragraph element that will be used as the image holder, and any information (height, width, etc) along with the blob from the image. And then at the end of your script just iterate over the stored references and add the images.

var imageParagraphs = [];    

...

case DocumentApp.ElementType.PARAGRAPH:
    var positionedImages = element.getPositionedImages();  
    if (positionedImages.length > 0){
      var imageData = [];
      for each(var image in positionedImages){
        imageData.push({
          height: image.getHeight(),
          width: image.getWidth(),
          leftOffset: image.getLeftOffset(),
          topOffset: image.getTopOffset(),
          layout: image.getLayout(),
          blob: image.getBlob()
        });
        element.removePositionedImage(image.getId());
      }  
      var p = merged_doc_body.appendParagraph(element.asParagraph());
      imageParagraphs.push({element: p, imageData: imageData});
    }
    else
      merged_doc_body.appendParagraph(element);
  break; 

...

for each(var p in imageParagraphs){
  var imageData = p.imageData
  var imageParagraph = p.element      
  for each(var image in imageData){
    imageParagraph.addPositionedImage(image.blob)
      .setHeight(image.height)
      .setWidth(image.width)
      .setLeftOffset(image.leftOffset)
      .setTopOffset(image.topOffset)
      .setLayout(image.layout);                
  }
}
NateG
  • 1