I'm trying to a script in JS (which I'm fairly new to) in which a user can input a string in x-SAMPA, a notation for transcribing pronunciation, and output the corresponding string in IPA, another such notation. Without any apparently better way of doing it, I have a (quite large) dictionary that maps each x-SAMPA character to its IPA counterpart. (For the sake of completeness: https://pastebin.com/srjEYAGU) I also want to be able to swap the key/value pairs and re-use the table for IPA ---> x-SAMPA.
Since str.replace()
only replaces the first instance (requiring a global search) and I'm trying to match a string variable, not a string literal, I guess I have to do this with a RegExp()
:
sInput = "TIs TIN Iz r\\ON"
Object.keys(xsampaToIPAMain).forEach(function(key) {
sInput = sInput.replace(new RegExp(key, 'g'), xsampaToIPAMain[key]);
});
console.log(sInput);
But, 1) x-SAMPA isn't a 1:1 correspondence between sounds and characters; in this case the two characters "r\" collectively refer to one sound, so I can't just go substituting character by character, and 2) x-SAMPA incorporates lots of special regex characters. In particular, as soon as it reaches the first key in the table with backslashes:
"h\\": "ɦ",
Regex throws an error:
"Invalid regular expression: /h\/: \ at end of pattern"
Searching around it seems regex first escapes \\
to \
and then tries to escape the next character, and so the pattern to find is really h\\\\
.
I can't search something like key+"\\"
since most keys don't have any backslashes to try to work around and so they'll just crash the regex.
And then there are the other special regex characters like ?
and }
and .
that are also x-SAMPA characters that need to be escaped but not, as far as I can tell, the same as backslashes).
So - what would be the least messy way to replace all instances of a variable string of varying length with special regex characters that need to be escaped?