0

I'm trying to a script in JS (which I'm fairly new to) in which a user can input a string in x-SAMPA, a notation for transcribing pronunciation, and output the corresponding string in IPA, another such notation. Without any apparently better way of doing it, I have a (quite large) dictionary that maps each x-SAMPA character to its IPA counterpart. (For the sake of completeness: https://pastebin.com/srjEYAGU) I also want to be able to swap the key/value pairs and re-use the table for IPA ---> x-SAMPA.

Since str.replace() only replaces the first instance (requiring a global search) and I'm trying to match a string variable, not a string literal, I guess I have to do this with a RegExp():

sInput = "TIs TIN Iz r\\ON"

Object.keys(xsampaToIPAMain).forEach(function(key) {
    sInput = sInput.replace(new RegExp(key, 'g'), xsampaToIPAMain[key]);
});
console.log(sInput);

But, 1) x-SAMPA isn't a 1:1 correspondence between sounds and characters; in this case the two characters "r\" collectively refer to one sound, so I can't just go substituting character by character, and 2) x-SAMPA incorporates lots of special regex characters. In particular, as soon as it reaches the first key in the table with backslashes:

    "h\\":   "ɦ",

Regex throws an error:

"Invalid regular expression: /h\/: \ at end of pattern"

Searching around it seems regex first escapes \\ to \ and then tries to escape the next character, and so the pattern to find is really h\\\\.

I can't search something like key+"\\" since most keys don't have any backslashes to try to work around and so they'll just crash the regex. And then there are the other special regex characters like ? and } and . that are also x-SAMPA characters that need to be escaped but not, as far as I can tell, the same as backslashes).

So - what would be the least messy way to replace all instances of a variable string of varying length with special regex characters that need to be escaped?

Arcaeca
  • 227
  • 3
  • 15

0 Answers0