I am interested in validating or automatically correcting the use of the indefinite articles "a" and "an" in blocks of English text from a textarea
.
The grammatical rule is that the choice of article depends on the sound that begins the next word. Details here and here. This appears incredibly broad, however there has been a suggestion in a previous answer (How can I correctly prefix a word with "a" and "an"?) to reference a huge database of English text to create the heuristics to infer the correct indefinite article to use in a given situation. Eamon Nerbonne comments that he has done this, so how can I apply that solution to this practical implementation?
The function I have so far implements the simplest part of the grammatical rule; it uses an when the following word starts with a vowel, and a otherwise. It also respects the existing capitalization of the article. In actual use, though, this isn't practical because the exceptions to that rule are very common. For example, "a horse" is correct while "a honour" and "a HTTP address" are not.
How can my function be expanded to properly handle actual pronunciation of words following the articles, including silent letters, acronyms, and "sometimes-y"? I don't require 100% accuracy - something better than 80% would be enough to improve the text I'm correcting.
Here's my fixArticles()
function; see the snippet for a running example.
function fixArticles( txt ) {
var valTxt = txt.replace(/\b(a|an) (\w*)\b/gim, function( match, article, following ) {
var newArticle = article.charAt(0);
switch (following.charAt(0).toLowerCase()) {
case 'a':
case 'e':
case 'i':
case 'o':
case 'u':
newArticle += 'n'; // an
break;
default:
// a
break;
}
if (newArticle !== article) {
newArticle = "<span class='changed'>" + newArticle + "</span>";
}
return newArticle+' '+following;
});
document.getElementById('output-text').innerHTML = valTxt.replace(/\n/gm,'<br/>');
}
function fixArticles( txt ) {
var valTxt = txt.replace(/\b(a|an) (\w*)\b/gim, function( match, article, following ) {
var newArticle = article.charAt(0);
switch (following.charAt(0).toLowerCase()) {
case 'a':
case 'e':
case 'i':
case 'o':
case 'u':
newArticle += 'n'; // an
break;
default:
// a
break;
}
if (newArticle !== article) {
newArticle = "<span class='changed'>" + newArticle + "</span>";
}
return newArticle+' '+following;
});
document.getElementById('output-text').innerHTML = valTxt.replace(/\n/gm,'<br/>');
}
input, label {
display:block;
}
.changed {
font-weight: bold;
}
<label for="input-text">Enter text</label>
<textarea id="input-text" cols="50" rows="5">An wise man once said: "A apple an day keeps the doctor away."
Give me an break.
I would like an apple.
My daughter wants a hippopotamus for Christmas.
It was an honest error.
Did a user click the button?
An MSDS (material safety data sheet) was used to record the data.
</textarea>
<input type="button" value="Fix a/an" onClick="fixArticles(document.getElementById('input-text').value)">
<hr>
<div id="output-text"/>
The expected output for the sample input is:
A wise man once said: "An apple a day keeps the doctor away."
Give me a break.
I would like an apple.
My daughter wants a hippopotamus for Christmas.
It was an honest error.
Did a user click the button?
An MSDS (material safety data sheet) was used to record the data.