2

As you can expect I deal with a legacy web application which uses mainly Windows-1252 as charset.

I also coded a little library set, among them one which contains accents. Theses files are in UTF-8, and are included into legacy code. So here I am :

                .------------.                              .-----------------.
                |   UTF-8    |                              |   Windows-1252  |
                |------------|                              |-----------------|
                | Dim str    | <-------- inclusion -------- | Dim str2        |
                | str = "é"  |                              | str2 = "è"      |
                |____________|                              |_________________|

It sounds like str2 will be processed as if "è" was encoded as UTF-8, although it's not the case.

I know that non-ASCII litterals should not be written in the code, but it's legacy. Moreover, I admit that I don't really want to downgrade convert UTF-8 files to Windows-1252. Also I'm looking for a clean way to tell the engine the right charset of string litterals before convert them to its internal representation. Response.Codepage doesn' seems to be relevant or working. Is there a clean way other to solve this issue without turning new files into Windows-1252 charset ?

Amessihel
  • 5,891
  • 3
  • 16
  • 40
  • 1
    Frankly, I think a mixture is the worst of both worlds. If you absolutely don't have the time to bite the bullet and convert the legacy stuff to UTF-8 then saving your new files as Win 1252 is probably what you should be thinking about doing – John Mar 01 '16 at 13:57
  • I knew it... Actually I wanted to "isolate" some Win1252 files while aiming to convert the whole legacy stuff. Thanks @John for your comment. – Amessihel Mar 01 '16 at 14:10
  • *"I know that non-ASCII litterals should not be written in the code"* - Eh why? – user692942 Mar 01 '16 at 14:34
  • You should be using the `@CodePage = ` directive to tell IIS how to process the file and make sure that the ASP file is saved in the correct encoding to match the directive. `Response.CodePage` tells ASP how to return responses not process them in the first place. Some guidelines [here](http://stackoverflow.com/a/17680939/692942) – user692942 Mar 01 '16 at 14:40
  • @Lankymart `@CodePage` cannot be used more than once. – Amessihel Mar 01 '16 at 14:45
  • It has to be the first line in the file it tells IIS how to process the file you don't need it more then once. If you then want to force responses into different code pages use `Response.CodePage` in conjunction with `Response.Write`. Remember though a HTTP response should only contain one codepage once the response is sent to the client, otherwise you will get encoding mismatches over the place. The `@codepage` directive should always match the physical encoding of the file so will never be needed more then once, one file can only be encoded one way. – user692942 Mar 01 '16 at 14:47
  • @Lankymart Actually it's not about HTML response, but string representation... str1 and str2 are used to parse a file, not to output text to the browser. Also here we have two files, with a different encoding. Each should have its own `@codepage` though it's not possible. (Sorry for mispelled your nickname.) – Amessihel Mar 01 '16 at 14:50
  • It's Mart. Regardless the same rules apply whether you are sending a HTTP response or dealing with something on the server. But seen as though you provide no code sample, how do you expect us to second guess your intent for the ASP file? – user692942 Mar 01 '16 at 14:51
  • So @Lankymart, if I understand : 1/ `@Codepage` at the first line of the including file, 2/ `Response.CodePage` into each included file ? – Amessihel Mar 01 '16 at 14:55
  • Slow down, who said anything about `#include` files?...rules are parent ASP file contains the `@codepage` directive pointing to the same encoding used to save that physical file. Includes referenced in that file need to match the encoding of the parent and the `@codepage` directive or you will get encoding mismatches. Only the parent ASP file can contain the `@codepage` directive. Might be useful [Setting the Code Page for String Conversions](https://msdn.microsoft.com/en-us/library/ms525789(v=vs.90).aspx) – user692942 Mar 01 '16 at 14:59
  • @Lankymart, _"Includes referenced in that file need to match the encoding of the parent"_ ... ?? My questions is about dealing with inclusion of unmatching charset files. I don't think I could provide a relevant sample code, more relevant than the diagram above. – Amessihel Mar 01 '16 at 15:07
  • 1
    Sorry that wasn't clear from the question. The answer is you will always get encoding mismatches if you `#include` files that do that match the encoding of the parent ASP file you include them in. Sometimes it's not clear because most ASCII characters map like for like to UTF-8 but other character sets like eastern european you will begin to notice the problem with some accent characters. In theory you can just fudge it and hope it works but it can lead to a world of hurt later on. – user692942 Mar 01 '16 at 15:29
  • @Lankymart, indeed it's in the same way of thinking of John's comment. – Amessihel Mar 01 '16 at 16:03

0 Answers0