If the CMS won't handle the copyright symbol (U+00a9), there is something quite wrong with it. Assuming that problem cannot be fixed, you have to "encode" such characters somehow. Under this solution, anyone writing anything or reading anything from the CMS is going to have to be conscious of this encoding, and do the appropriate encoding on the way in and decoding on the way out. This is not a happy path.
For instance, assume the CMS has an editor. How is the user going to input the "special" characters in the editor? Is the editor going to be modified to handle the necessary encoding and decoding?
Anyway, assuming you do decide to go the encoding route, which encoding to choose? Others have suggested encoding using HTML entities such as ©
. This is probably not the best solution. First, is assumes the content is always going to be output in an HTML environment. Possibly more importantly, it cannot handle characters which do not have HTML entity encodings. Therefore, using an encoding such as the JS string encoding ("Bad Idea\u00a9"
) is probably your best bet. If the API to the CMS uses JSON, everything should pretty much work.
Alternative encodings you might consider are URI encoding or BASE64 encoding, but neither seems like a wonderful idea.
Having said that, you seem to be fuzzy on the distinction between character encodings and formatting. You say
> how to detect a specific char like © and make it small and align top
If you already have a real copyright symbol, then you don't need to do anything, because all fonts will already display it correctly. For instance, if you have encoded the copyright symbol in your database as \u00a9
and are sending that down in JSON, it will already be a copyright symbol, and will be displayed correctly.
Or are you proposing to store the copyright symbol in the CMS as the three characters "(c)"
, and treat that as a copyright symbol for formatting/display purposes? In that case, yes, you would need to detect such sequences and wrap them in a bit of HTML which applies the relevant CSS properties.