2

Hi I am now sending a snapshot of current webpage using Onenote API:

http://msdn.microsoft.com/en-us/library/office/dn575431(v=office.15).aspx

http://msdn.microsoft.com/en-us/library/office/dn575438(v=office.15).aspx#sectionSection4

When posting the multipart content, I put HTML content in the 'MyAppHtmlId' part:

<img data-render-src="name:MyAppHtmlId" alt="a cool image" width="500"/>

and the HTML content is got by:

document.documentElement.outerHTML;

The problem is that sometimes the snapshot saved in Onenote is not exact what I saw in the browser. But when I turn to use the Chrome extension "OneNote Clipper" to test the same page, it works well. (Example page: https://stackoverflow.com/)

Is that I use the wrong Javascript code to get the HTML content or something else I missed about Onenote API?

Community
  • 1
  • 1
JunjieLi
  • 23
  • 5
  • Can you please include an example of the full DOM that you got by calling document.documentElement.outerHTML where it didn't work as you expected? Thanks, James – JamesLau-MSFT Sep 19 '14 at 04:19

1 Answers1

0

There are a few ways you'll want to modify the result of document.documentElement.outerHTML before sending them up to the service.

  1. Make sure you include the DOCTYPE. If you don't include DOCTYPE data, the screenshot will be rendered in quirks mode, and this could easily lead to less than ideal screenshots. There's a post that explains how to generate the correct DOCTYPE string here: Get DocType of an HTML as string with Javascript I've incorporated their code in the sample below.

  2. Since the screenshot will be generated using only the DOM and no knowledge of the original url, you'll want to add a base​ tag if there isn't one present. This ensures that any relative paths used for resources are understood correctly.

  3. You'll probably want to remove any noscript tags. Since your payload will be rendered without running Javascript, any noscript tags that are present will be rendered, and this could cause unwanted things to appear in your screenshot.

Here's some sample code that should take care of these goals while minimizing changes on the current page's DOM:

function getDom() {
    // Get the DOCTYPE
    var node = document.doctype;
    var doctype = "<!DOCTYPE "
        + node.name
        + (node.publicId ? ' PUBLIC "' + node.publicId + '"' : '')
        + (!node.publicId && node.systemId ? ' SYSTEM' : '')
        + (node.systemId ? ' "' + node.systemId + '"' : '')
        + '>';

    // Before we get the document's outerHTML, create a base tag if one doesn't exist
    if (!document.getElementsByTagName('base').length) {
        var baseUrl = document.location.href;
        baseUrl = baseUrl.substr(0, baseUrl.lastIndexOf('/') + 1);

        var base = document.createElement('base');
        base.href = baseUrl;
        // The base tag is the first child of head to ensure we don't misload links/scripts in the head
        document.head.insertBefore(base, document.head.firstChild);
    }

    // Store the outerHTML with DOCTYPE
    var html = doctype + document.documentElement.outerHTML;

    // Remove the text for any noscript elements from html
    var noscriptElements = document.getElementsByTagName('noscript');
    for (var i = 0; i < noscriptElements.length; i++) {
        html = html.replace(noscriptElements[i].outerHTML, '');
    }

    return html;
}
Community
  • 1
  • 1
  • Thanks Eleazar, the code works fine, except that when document.doctype === null, there will be an exception. – JunjieLi Sep 25 '14 at 05:51