1

i want to create a script to capitalize sentences in a google doc, but without changing existing attributes in certain words. for example, in a google doc, there would be several paragraphs, with each paragraph having several sentences. in such google doc, there would be hyperlinks, words in boldface, words in italics, words with underline, etc. i want all of these attributes to stay intact; the script should only capitalize the sentences, without removing the existing attributes for these words.

i wrote the following google doc script that did the job in terms of capitalizing sentences, but the script removed all attributes in others words (hyperlinks, boldface, italics, underline) as mentioned above.

function cap12() {

  // define function "replacement" to change the matched pattern to uppercase
  function replacement(match) { return match.toUpperCase(); }

  // define regex "period, followed by zero or any number 
  // of blank spaces, followed by any lowercase character"
  var regex1 = "(^|\.)(\s*)[a-z]";
  var regex2 = /(^|\.)(\s*)[a-z]/;
  // Logger.log(regex1, regex2);

  // get text matching pattern "regex"
  var body = DocumentApp.getActiveDocument().getBody();
  var foundElement = body.findText(regex1);
  // Logger.log(foundElement);

  while (foundElement != null) {
    // Get the text object from the element
    var foundText = foundElement.getElement().asText();   

    // capitalize the character after the period   
    var str1 = foundText.getText();
    var str2 = str1.replace(regex2, replacement);
    foundText.setText(str2);

    // Find the next match
    foundElement = body.findText(regex1, foundElement);
  }

}

i appreciate any help to point out my errors. thank you.

NOTE: the above script cap12 is a continuation of my project to develop a google script to capitalize sentences as documented in the post google doc script to capitalize sentences. the final script cap7 in this post only worked locally on the selected text (i.e., not over the entire document), but also removed all attributes such as hyperlinks, boldface, italics, underline.

by reading the related posts listed on the right column, in a more or less random fashion, guided perhaps by instinct, i stumbled on a nice script likely written by a pro, from which newbies (like me) could learn a lot. so i described what i did below in case someone would be interested.

from the post, related to my present post, titled Google Apps Script/Javascript search and replace with regex not working, i noticed another related post Find and change unknown strings to UPPERCASE in Google Apps Script Document using JS, in which there was a very nice code posted by Mogsdad. at first, reading through the discussion, i thought that i had to try to understand this code posted in the answer by Mogsdad.

then i noticed the sentence "The following script is part of a document add-on, source available in this gist, in changeCase.js." (i still need to find out what a "gist" means; something related to GitHub. ok, "gist" is explained in the post What is the difference between github and gist? [closed].)

so i looked into the link this gist, and indeed i found the script changeCase.js that contained ALMOST what i was trying to develop (i.e., "Sentence Case"; some problems are described below):

changeCase.js - Document add-in, provides case-change operations in the add-in Menu.

onOpen - installs "Change Case" menu
_changeCase - worker function to locate selected text and change text case. Case conversion is managed via callback to a function that accepts a string as a parameter and returns the converted string.
helper functions for five cases
UPPER CASE
lower case
Title Case
Sentence case
camelCase
Fountain-lite, screenplay formatting - see http://fountain.io/

it was an amazing code; it would take me a long time to reach this level to develop such a code.

i installed the script changeCase.js in my google doc, closed the google doc to reopen it again to activate the add-on changeCase.js. then i tested the "Sentence case" option; the script worked nicely in the sense that i could select the text within which i wanted to capitalize the sentences, avoiding words with special formatting such as boldface, italics, underline, etc.

but when i selected the text that included words in boldface (and/or italics, and/or underline), then the boldface attribute would be removed (exactly the same problem that i wanted to solve). so the script changeCase.js did not solve my problem, but provided a work-around.

the script changeCase.js only worked locally on the selected text, whereas the script cap7 in my post google doc script to capitalize sentences, which worked on the whole paragraph, even though i selected only a portion of the paragraph.

in other words, i could modify my script cap7 to work as the "Sentence Case" option of the script changeCase.js. i believe that the problem was that i did a "global" search and replace, instead of a "local" search and replace within the selected text.

a problem with the "Sentence Case" option of the script changeCase.js was that it converted to lowercase all characters within the selected text, which was not what i want, since there were characters i wanted to keep in uppercase (e.g., names of people, etc.). i only wanted to capitalize the sentences within the selected text, without modifying anything within these sentences.

to do what i described above, simply remove the method "toLowerCase" in the code:

// https://stackoverflow.com/a/19089667/1677912
function _toSentenceCase (str) {
  var rg = /(^\s*\w{1}|\.\s*\w{1})/gi;
  return str.toLowerCase().replace(rg, function(toReplace) {
    return toReplace.toUpperCase();
  });
}

i.e., use the following modified code:

function _toSentenceCase (str) {
  var rg = /(^\s*\w{1}|\.\s*\w{1})/gi;
  return str.replace(rg, function(toReplace) {
    return toReplace.toUpperCase();
  });
}

it worked.

another problem with the script changeCase.js was that it worked only on one paragraph at a time, which is not so efficient.

i wanted to develop a script that would work on the entire document, and without removing existing attributes (hyperlinks, boldface, italics, underline).

i appreciate if someone could point out the errors / problems in my script cap12b in the post google doc script, attributes (bold, italics, underline) not shown in log.

Community
  • 1
  • 1
Luke V
  • 351
  • 1
  • 2
  • 7
  • In this case did you try doing the replacement directly withing the editAsText() property of the body? You could run a method before to scan for attributes before replacing the text. – Robin Gertenbach Oct 24 '15 at 07:49
  • thanks for your input Robin. i am not sure i understood what you had in mind in terms of using the method "editAsText()", but i did see that i did not have the method "editAsText()" in my code. i will try a few things to include this method in the code. your second sentence is a bit more cryptic; which method do you suggest to run to scan for attributes before using the method "setText" ? perhaps the method "getAttributes()", or the method "getAttributeIndices()" ? i will try some of these, and will report the results here in this page. thanks again for the tips. – Luke V Oct 24 '15 at 13:10
  • Note that in general replacing text while preserving formatting is ill-defined. If you want to replace '**some**_thing_like`this`' with 'somenewthing' who knows what style 'somenewthing' should have. But certainly in the case where what you are replacing has uniform attributes throughout, you would expect the replacement to retain those attributes. I'm working on something that replaces bits of text too and am finding that it's hard not to lose formatting. – Baxissimo Oct 18 '16 at 05:57
  • I expected that at the very least I would be able to call `attrs = text.getAttributes(startPos); text.deleteText(startPos, endPos); text.insertText(startPos, newText); text.setAttributes(startPos, startPos + newText.length - 1, attrs);` and it would reapply the attributes that got wiped out. But even that doesn't work reliably. Sometimes the getAttributes call doesn't actually get attributes. Sometimes it seems the setAttributes call fails to work. Sometimes it works fine. – Baxissimo Oct 18 '16 at 05:57
  • One important thing I just discovered is that the above mojo fails if you're editing a non-Text element with "element.editAsText()". Like a Paragraph. If you try to edit a Paragraph you will lose formatting and won't be able properly read attributes off the text to restore them. (And they do this in the Google Translate example, so it probably destroys formatting too.) You want to always dive into Paragraphs's children to get at Texts directly if you plan to modify them. – Baxissimo Oct 18 '16 at 06:46

0 Answers0