Your sample looks like this in my browser:
When you are posting a question about how characters are rendered, you should always include an image. The characters may not render the same on other people's computers as they do on yours. They could even get re-encoded by the Stack Overflow server. In this answer I assume that SO is delivering the same bytes that you posted and that I am seeing the same thing that you see.
Your characters are delivered in UTF-8 by the database, but they are being rendered as Windows-1252. The first question is whether Perl knows that it is getting UTF-8 characters. length $tring
will tell you how many characters Perl thinks it sees. If 7, then Perl knows that the data is in UTF-8. If 14, then Perl is unsure what it has, so it's just counting bytes. If 12, then Perl has already decided that the data is in Windows-1252 (two of your bytes being discarded as invalid characters).
My guess is that you'll get 14, so Perl uses the shell's default encoding for the output. Are you on a Windows machine? If you get either 12 or 14, then you need to tell Perl that the input data is in UTF-8. If you're reading from a file handle, then you just need to insert the line binmode FH, ':encoding(UTF-8)'
right after you open the file handle. My guess is that you are using a database API package. If so, then you need to read the documentation for the package to see how to set the encoding.
If length $tring
gives 7, then Perl knows what it has, and the problem is on the output. If you want help with that, then you'll need to add details to your question about how you are viewing the output. If you are just printing to the terminal, then try binmode STDOUT, ':encoding(UTF-8)'
before you start printing.
If you want to inspect the data as Perl sees it, then use unpack 'H*', $tring
. You will either get cf83cf84ceb1cf8dcf81cebfcf82 or cf83cf84ceb1cfcfcebfcf82, depending on whether Perl has already discarded the two invalid Windows-1252 bytes.