1

Formerly my webpage used the charset UTF-8, and it inserted a lot of content to my SQL Server 2008 database with this charset.

Now my webpage is using charset ISO-8859-1. But it is still using the same content from the database. Now my problem is, that the content in the database are with the old charset.

Is there a way to convert everything in the database from one charset to another? One for all, or with the connection string?

MicBehrens
  • 1,780
  • 8
  • 34
  • 57
  • I'm curious why you would switch **from** utf-8 **to** an ISO encoding? Usually it is the other way around. Utf-8 can represent every Unicode codepoint that an ISO can. The world is trying to move away from ISO and other Ansi encoding and embrace Unicode. – Remy Lebeau Mar 27 '12 at 08:53
  • It doesn't really matter which way I would be going. I just need to know if there is a way to do this... – MicBehrens Mar 27 '12 at 09:53
  • 1
    Assuming you stored your UTF-8 data in `char/varchar/text` columns, you would have to read the data as Unicode using a UTF-8 collation to allow for proper conversions. You would then have to update your ASP code to convert that Unicode data to ISO yourself before sending it to the client. So it doesn't make sense to switch to ISO, the data is UTF-8 so send it to the client as UTF-8. In the future, design your databases to use `nchar/nvarchar` to avoid issues with foreign text. – Remy Lebeau Mar 27 '12 at 19:23
  • I actually am storing it as `varchar/text`. The future design is duely noted, thanks! By setting the `response.codepage` the whole problem were solved :) But thanks for the nvarchar/ntext-point :) – MicBehrens Mar 28 '12 at 09:48

1 Answers1

1

Well first of all you are probably using a NVARCHAR or NTEXT field in your database already. Hence the content of field is encoded as Unicode.

It would be nice to assume that your original posting form posted using UTF-8 encoding and your receiving page had its Response.Codepage set to 65001 so that the incoming string is stored in the database with fidelity.

If the foregoing is true then to send the content to the client using a new charset it would be a simple matter of setting the page codepage correctly, for ISO-8869-1 we use the codepage 1252. With the codepage set to 1252 any data sent using Response.Write will be converted from the native Unicode to the 1252 codepage.

However, it is also quite possible for you to have got by with corrupt data being stored in the DB but it all looking fine in HTML. See my answer here to an older question for detail on how that might be. That same answer contains the steps need to repair the data in the DB. After that setting the output codepage should be sufficient.

Note that the ASP file itself should be saved as Windows-1252 and not UTF-8 otherwise any none ASCII static content in the file will be accepted incorrectly by the client.

Community
  • 1
  • 1
AnthonyWJones
  • 187,081
  • 35
  • 232
  • 306
  • I didn't use `nvarchar/ntext`. I do now. - I hadn't pud `response.codepage` and `response.charset`. I do now and this solved EVERYTHING! :) Thanks again for your help Anthony. As usual I'm learning a lot :) – MicBehrens Mar 28 '12 at 09:47