Here's a runnable version of the initial code (I have slightly modified the input string):
String.prototype.toSentenceCase = function() {
var i, j, str, lowers, uppers;
str = this.replace(/(^\w{1}|\.\s*\w{1})/gi, function(txt) {
return txt.charAt(0).toUpperCase() + txt.substr(1).toLowerCase();
});
// Certain words such as initialisms or acronyms should be left uppercase
uppers = ['Id', 'Tv', 'Nasa', 'Acronyms'];
for (i = 0, j = uppers.length; i < j; i++)
str = str.replace(new RegExp('\\b' + uppers[i] + '\\b', 'g'),
uppers[i].toUpperCase());
// To remove Special caharacters like ':' and '?'
str = str.replace(/[""]/g,'');
str = str.replace(/[?]/g,'');
str = str.replace(/[:]/g,' - ');
return str;
}
const input = `play around: This is a "String" Of text, which needs to be cONVERTED to Sentence Case at the same time keeping the Acronyms as it is like Nasa. another sentence. "third" sentence starting with a quote.`
const result = input.toSentenceCase()
console.log(result)
I ran into other issues like some letters in the sentence are still in Uppercase, especially texts in and after Double Quotes (" ") and camelcase texts.
Some letters remain uppercased because you are not calling .toLowerCase()
anywhere in your code. Expect in the beginning, but that regex is targetingonly the initial letters of sentences, not other letters.
It can be helpful to first lowercase all letters, and then uppercase some letters (acronyms and initial letters of sentences). So, let's call .toLowerCase()
in the beginning:
String.prototype.toSentenceCase = function() {
var i, j, str, lowers, uppers;
str = this.toLowerCase();
// ...
return str;
}
Next, let's take a look at this regex:
/(^\w{1}|\.\s*\w{1})/gi
The parentheses are unnecessary, because the capturing group is not used in the replacer function. The {1}
quantifiers are also unnecessary, because by default \w
matches only one character. So we can simplify the regex like so:
/^\w|\.\s*\w/gi
This regex finds two matches from the input string:
Both matches contain only one letter (\w
), so in the replacer function, we can safely call txt.toUpperCase()
instead of the current, more complex expression (txt.charAt(0).toUpperCase() + txt.substr(1).toLowerCase()
). We can also use an arrow function:
String.prototype.toSentenceCase = function() {
var i, j, str, lowers, uppers;
str = this.toLowerCase();
str = str.replace(/^\w|\.\s*\w/gi, (txt) => txt.toUpperCase());
// ...
return str;
}
However, the initial letter of the third sentence is not uppercased because the sentence starts with a quote. Because we are anyway going to remove quotes and question marks, let's do it at the beginning.
Let's also simplify and combine the regexes:
// Before
str = str.replace(/[""]/g,'');
str = str.replace(/[?]/g,'');
str = str.replace(/[:]/g,' - ');
// After
str = str.replace(/["?]/g,'');
str = str.replace(/:/g,' - ');
So:
String.prototype.toSentenceCase = function() {
var i, j, str, lowers, uppers;
str = this;
str = str.toLowerCase();
str = str.replace(/["?]/g,'');
str = str.replace(/:/g,' - ');
str = str.replace(/^\w|\.\s*\w/gi, (txt) => txt.toUpperCase());
// ...
return str;
}
Now the initial letter of the third sentence is correctly uppercased. That's because when we are uppercasing the initial letters, the third sentence doesn't start with a quote anymore (because we have removed the quote).
What's left is to uppercase acronyms. In your regex, you probably want to use the i
flag as well for case-insensitive matches.
Instead of using a for
loop, it's possible to use a single regex to look for all matches and uppercase them. This allows us to get rid of most of the variables as well. Like so:
String.prototype.toSentenceCase = function() {
var str;
str = this;
str = str.toLowerCase();
str = str.replace(/["?]/g,'');
str = str.replace(/:/g,' - ');
str = str.replace(/^\w|\.\s*\w/gi, (txt) => txt.toUpperCase());
str = str.replace(/\b(id|tv|nasa|acronyms)\b/gi, (txt) => txt.toUpperCase());
return str;
}
And looks like we are now getting correct results!
Three more things, though:
- Instead of creating and mutating the
str
variable, we can modify this
and chain the method calls.
- It might make sense to rename the
txt
variables to match
variables, since they are regex matches.
- Modifying a built-in object's prototype is a bad idea. Creating a new function is a better idea.
Here's the final code:
function convertToSentenceCase(str) {
return str
.toLowerCase()
.replace(/["?]/g, '')
.replace(/:/g, ' - ')
.replace(/^\w|\.\s*\w/gi, (match) => match.toUpperCase())
.replace(/\b(id|tv|nasa|acronyms)\b/gi, (match) => match.toUpperCase())
}
const input = `play around: This is a "String" Of text, which needs to be cONVERTED to Sentence Case at the same time keeping the Acronyms as it is like Nasa. another sentence. "third" sentence starting with a quote.`
const result = convertToSentenceCase(input)
console.log(result)