I am trying to find a simple JS way to convert RTF to plain text and I found that simple solution which seems to be satisfactory for my needs. However, all my RTF is in Portuguese, with some Latin1 characters which are not replaced by the mentioned functions.
I just placed one more regexp to substitute RTF´s "\'hh" sequences by Javascript´s "\xhh", so I have:
function convertToPlain(rtf) {
rtf = rtf.replace(/\\par[d]?/g, "")
rtf = rtf.replace(/\{\*?\\[^{}]+}|[{}]|\\\n?[A-Za-z]+\n?(?:-?\d+)?[ ]?/g, "").trim()
rtf = rtf.replace(/\\'/g, '\\x')
return rtf;
}
The replacements happen. But, playing with the code in JSFiddle, I can´t get the returned string with its "\xhh" sequences substituted. Here´s a sample of the result:
a inaugura\xe7\xe3o do novo Castel\xe3o, para as competi\xe7\xf5es
However, if I change the return statement to use the above sample as a literal, like...
return " a inaugura\xe7\xe3o do novo Castel\xe3o, para as competi\xe7\xf5es"
... the characters are substituted as expected:
a inauguração do novo Castelão, para as competições
It seems that something happens with the string variable (but not to a string literal) that causes its special characters not to be substituted. However, I could not find any explanation for this here in SO, nor in MSDN, W3C, books I have, whatsoever.
Could somebody please shed a light here? Thanks!
Fabricio