You have multiple questions.
The "???" probably came from converting from latin1 to utf8 incorrectly. The data is now lost, since only '?' remains. SELECT HEX(...) ...
to confirm that all you get is 3F
(?
) where you should get something useful.
See "question marks" in Trouble with utf8 characters; what I see is not what I stored .
utf8mb4
and utf8
handle Cyrillic (Russian) identically, so the CHARACTER SET
is not the issue with respect to the "???".
If you have an original copy of the data, then probably you want the 3rd item in here -- "CHARACTER SET latin1, but have utf8 bytes in it; leave bytes alone while fixing charset". That is what I call the two-step ALTER
.
As for avoiding future issues... See "Best Practice" in my first link. If all you need is European (including Russian), either utf8 or utf8mb4 will suffice. But if you want Emoji or all of Chinese, then go with utf8mb4.
Also, note that you must specify what charset the client is using; this is a common omission, and was probably part of what got you in trouble in the first place.