I've made a function strConv()
to run a number of replace()
methods over text input from the user. It has a regex for each modification:
List of Replacements
- Smart Quotes
- Replace the straight quotes `'` and `"` with `‘`,`’` and `“`,`”`
- Em Dashes
- Replace `--` with ` — `
- Ellipsis
- Replace `...` with `…`
- Ordinals
- Replace the suffix of all ordinals with a superscript equivalent.
- ex. `1st` to `1<sup>st</sup>` or `20th` to `20<sup>th</sup>`
- Single Digits
- Any occurrence of a single digit number will be converted to it's word equivalent.
- ex. `1` to `one` or `7` to `seven`
- Court Decision Titles
- If there are any patterns like this:
- `<i>`ONE OR MORE WORDS`</i>` v. `<i>`ONE OR MORE WORDS`</i>`
- Remove the 2nd and 3rd tag:
- `<i>`ONE OR MORE WORDS v. ONE OR MORE WORDS`</i>`
- United States Abbreviation
- Replace `U.S.` with `US`
- Percentages
- Replace `%` with ` percent`
I'm getting the wrong results in two places: ordinals and court decision titles. The reason why I'm including all the regex is because my problem may stem from the order they are arranged and how one of them effects another's results.
You'll find the actual regex in the MCVE, a test sample of text to input, and a list of the expected results that can be compared with the results. Just click the PROCESS button. Thank you for your valuable time.
MCVE
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width,initial-scale=1, user-scalable=no">
<title>strConv</title>
<style>
section {
width: 90vw;
min-height: 250px;
border: 3px ridge grey;
padding: 10px;
margin: 30px auto;
}
button {
display: block;
margin: 0 auto;
font-size: 24px;
}
dt {
color: blue;
}
</style>
</head>
<body>
<header>
</header>
<section id='editor1' contenteditable="true">
"double quotes"
<br>'single quotes'
<br>em--dash
<br>ellipsis...
<br>19th
<br>1st
<br>fourth
<br>1
<br>2 9 23
<br>
<i>Roe</i> v. <i>Wade</i>
<br>U.S..
<br>%
<br>
</section>
<button id='button1'>PROCESS</button>
<section id='display1'></section>
<footer>
<h3>The content in the brackets [] is what is expected</h3>
<dl>
<dt>Smart Quotes: PASS</dt>
<dd>"double quotes" [“double quotes”]</dd>
<dd>'single quotes' [‘single quotes’]</dd>
<dt>Em Dash: PASS</dt>
<dd>em--dash [em — dash]</dd>
<dt>Ellipsis: PASS</dt>
<dd>ellipsis... [ellipsis…]</dd>
<dt><mark>Ordinals: FAIL</mark></dt>
<dd>19th [19<sup>th</sup>]</dd>
<dd>
<mark>1st [1<sup>st</sup>]</mark>
</dd>
<dd>fourth [fourth]</dd>
<dt>Single Digits: PASS?</dt>
<dd>1 [one]</dd>
<dd>2 9 23 [two nine 23]</dd>
<dt><mark>Court Decision Titles: FAIL</mark></dt>
<dd>
<mark><i>Roe</i> v. <i>Wade</i> [<i>Roe v. Wade</i>]</mark>
</dd>
<dt>United States Abbreviation: PASS</dt>
<dd>U.S.. [US.]</dd>
<dt>Percentages: PASS</dt>
<dd>% [ percent]</dd>
</dl>
</footer>
<script>
document.getElementById('button1').addEventListener('click', stringUI, false);
function stringUI() {
var editor = document.getElementById('editor1');
var content = editor.innerText;
var result = strConv(content);
var article = document.createElement('article');
article.innerText = result;
document.getElementById('display1').appendChild(article);
}
function strConv(str) {
// Smart Quotes
str = str.replace(/(^|[-\u2014/(\[{"\s])'/g, "$1\u2018");
str = str.replace(/'/g, "\u2019");
str = str.replace(/(^|[-\u2014/(\[{\u2018\s])"/g, "$1\u201c");
str = str.replace(/"/g, "\u201d");
// Em Dashes
str = str.replace(/--/g, " \u2014 ");
// Ellipsis
str = str.replace(/\.\.\./g, "\u2026");
/*FAIL*/// Ordinals
str = str.replace(/\b([10-9]{1,3})(th|nd|rd|st)\b/g, "$1<sup>$2<\/sup>");
// Single Digits
str = str.replace(/\b1\b/g, "one");
str = str.replace(/\b2\b/g, "two");
str = str.replace(/\b3\b/g, "three");
str = str.replace(/\b4\b/g, "four");
str = str.replace(/\b5\b/g, "five");
str = str.replace(/\b6\b/g, "six");
str = str.replace(/\b7\b/g, "seven");
str = str.replace(/\b8\b/g, "eight");
str = str.replace(/\b9\b/g, "nine");
/*FAIL*/// Court Decision Titles
str = str.replace(/(<i>\w.*)<\/i>(\s\bv\.\b\s)<i>(\w.*<\/i>)/g, "$1$2$3");
// United States Abbreviation
str = str.replace(/\bU\.S\.\b|\bU\.S\.(\.)/g, "US$1");
// Percentages
str = str.replace(/%/g, " percent");
return str;
}
</script>
</body>
</html>