3

We have a web app that was created in CF and when we copy from word/outlook and paste into out comment field to submit. It saves to the database with some kind of encoding that causes "?".

I have been trying figure out what the mysterious coding so i can perform a replace function. I have tried so many replaces:

str = RemoveHTML(str);
//Replace Tabls
str = Replace(str, Chr(9), " ", "All");
//Replace Newlines and carriage returns
str = Replace(str, Chr(10), " ", "All");
str = Replace(str, Chr(13), " ", "All");
str = Replace(str,"[^0-9A-Za-z ]","","all");
str = Replace(str, "[^\x20-\x7E]", "", "ALL");

str = Replace(str, Chr(32), " ", "All");



//Replace two or more blanks with one space
str = ReReplaceNoCase(str, "[[:blank:]]{2,}", " ", "All");
str = Trim(str);
str = Left(str, 5000);
return str;

Any help going in the right direction would be greatly appreciated.

Thanks so much!

rrk
  • 15,677
  • 4
  • 29
  • 45
Scott
  • 459
  • 2
  • 10
  • Have you tried `canonicalize()` – James A Mohler Dec 01 '21 at 19:34
  • Not familiar with that tag. I will certainly read up on it. How exactly would that help nd what would it do? Thanks! – Scott Dec 01 '21 at 19:56
  • A question mark is normally a hint that the encoding of a character in the source string cannot be displayed in the encoding in the target. This can have several different reasons. Normally, you should aim for UTF-8 encoding. So, is your database field encoded in UTF-8? And is the document that is sending the data UTF-8 encoded? – Sebastian Zartner Dec 01 '21 at 19:59
  • The `canonicalize()` function decodes an input string. You can read more about it at https://cfdocs.org/canonicalize. Though I don't think it will help you in this case, because the function just returns an empty string or throws if the input string contains multiple or mixed encodings. – Sebastian Zartner Dec 01 '21 at 20:04
  • The DeMoronize UDF at cflib.org might help. https://cflib.org/udf/DeMoronize – Dan Bracuk Dec 01 '21 at 20:47
  • You can implement a client-side solution to this using JavaScript. When the user pastes their copied text, you can force the mime type to `text/plain`. See solution [here](https://stackoverflow.com/a/12028136/12031119). – user12031119 Dec 01 '21 at 21:39
  • What Database? Check *Enable High ASCII* in CFADMIN – Bernhard Döbler Dec 01 '21 at 22:55
  • Why you can't use ckeditor instead of textarea. As per your requirement you have to copy paste it from word, outlook. We have option in ckeditor that will handle copy word from docs, ourlook emails. Please try that. – Kannan.P Dec 02 '21 at 13:34

1 Answers1

3

The character encoding is not going through. The content of the Word document contains characters (like the curly quote marks) that the DB isn't recognizing. Check the DB table's column data type to see if it's VARCHAR or NVARCHAR.

  • NVARCHAR allows Unicode
  • VARCHAR doesn't

There are some RTF editors that magically convert text when pasted from MS Word. You can try to just capture the paste event and call some JS code to address the common special characters from Word documents.

https://developer.mozilla.org/en-US/docs/Web/API/Element/paste_event

Or try an RTF editor for the <textarea> that should make that conversion for you.

https://quilljs.com/

Adrian J. Moreno
  • 14,350
  • 1
  • 37
  • 44