0

I'm attempting to provide support for a legacy ASP/MSSQL web application - I wasn't involved in the development of the software (the company that built it no longer exists) & I'm not the admin of the server where it's hosted, I just manage the hosting for the owners of the site via a reseller account. I'm also not an ASP developer (more a PHP guy), and am not that familiar with it beyond the basics - updating DB connection strings after server migrations, etc.

The issue is that the site in question stores the content of individuals pages in an MSSQL database, and much of the content includes links. Almost all of the internal links on the site are format like "main.asp?123" (with "123" being the ID of a database row). The problem is, starting sometime in the last 8 months or so*, something caused the links in the DB content to show up as "main.aspÀ123" instead - in other words, the "?" character is being replaced by the "À" character (capital A with grave accent). Which, of course, breaks all of those links. Note that Stackoverflow won't allow me to include that character in the post title, because it seems to think that it indicates I'm posting in Spanish...?

(*unfortunately I don't know the timing beyond that, the site owners didn't know when the issue started occurring, so all I have to go by is an archive.org snapshot from last October, where it was working)

I attempted to manually change the "?" character in one of the relevant DB records to "?" (the HTML entity for the question mark), but that didn't make any difference. I also checked the character encoding of the HTML code used to display the content, but that doesn't seem to be the cause either - the same ASP files contain hard-coded links to some of the same pages (formatted exactly the same way), and those work correctly: the "?" doesn't get replaced.

I've also connected to the database directly with the MSSQL Management Studio Express application, but couldn't find any charset/character encoding options for either the database or the table.

And I've tried contacting the hosting provider, but they (M247 UK, in case anyone is curious) have been laughably unhelpful. The responses from them have been along the lines of "durrrrrr, we checked a totally different link that wasn't actually the one that you clearly described AND highlighted in a screenshot, and it works when we check the wrong link, so the problem must be resolved, right?" Suffice it to say, I wouldn't recommend them - used to be a customer of RedFox hosting, and the quality of customer has dropped off substantially since M247 bought them.

Any suggestions? If this were PHP/MySQL, I'd probably start by creating a small test script that did nothing but fetch one of the relevant records and display it's contents, to narrow down the issue - but I'm not familiar enough with ASP to do that here, at least not without a fair amount of googl'ing (and most of the info I can find is specific to ASP.net instead).

Edit: the thread suggested as a solution appears to be for character encoding issues when writing to MSSQL, not reading from it - and I've tried the solutions suggested in that thread, none make any difference.

StephenB
  • 91
  • 1
  • 7
  • Possible duplicate of [Convert UTF-8 String Classic ASP to SQL Database](https://stackoverflow.com/questions/21866225/convert-utf-8-string-classic-asp-to-sql-database) – user692942 Jun 20 '19 at 17:43
  • Nope, doesn't appear to be - at least, I've tried all of the suggestions in that post, and no difference. – StephenB Jun 21 '19 at 19:07
  • If you've looked through that answer and it’s still not working then there is something you are not telling us or you don’t understand Classic ASP enough to follow and implement the solution in either case it doesn’t warrant yet another question on fixing encoding mismatches in ASP. – user692942 Jun 21 '19 at 20:03
  • While (as I stated in the OP) I'm not that familiar with ASP, I *do* know enough to be able to copy-paste a few lines of code into a file & then call it via an include statement. And I tried that with all of the sample code suggested in the various answers in various different ways (includes, inline, etc). Also, note that that question is about problems when WRITING to MSSQL, while the problem I had was related to content being read from MSSQL (or, as it turns out, apparently when it was output by the ASP code). That said, I was able to find a workaround. – StephenB Jun 22 '19 at 19:34
  • Useful read [Internationalization and Classic ASP](http://www.hanselman.com/blog/InternationalizationAndClassicASP.aspx) – user692942 Jun 22 '19 at 23:46

2 Answers2

0

Looks like you're converting from UNICODE to ASCII somewhere along the line...

Have a look at this to get a quick demo of what happens. In particular, pay attention to the ascii derived from inr, versus the ascii derived from unicode...

SELECT 
    t.n,
    ascii_char = CHAR(t.n),
    unicode_char = NCHAR(t.n),
    unicode_to_ascii = CONVERT(varchar(10), NCHAR(t.n))
FROM (
    SELECT TOP (1024)
        n = ROW_NUMBER() OVER (ORDER BY ao.object_id)
    FROM 
        sys.all_objects ao
    ) t
WHERE 1 = 1
    --AND CONVERT(varchar(10), NCHAR(t.n)) ='À'
;
Jason A. Long
  • 4,382
  • 1
  • 12
  • 17
  • *"Somewhere"* - nice and vague but very apt when trying to workout encoding mismatches. – user692942 Jun 20 '19 at 17:41
  • @Jason - I'm assuming that code snippet needs to be enclosed in "<%" and "%>"? (I'm not that experienced with ASP). Without that, when viewing the script, I see the actual code rather than whatever it's supposed to output. When I put "<%" and "%>" around it, I get this error message: `Microsoft VBScript compilation error '800a03fd' Expected 'Case' /db-issue-test.asp, line 4 SELECT ------^` – StephenB Jun 21 '19 at 19:14
  • (Sorry for the wonky formatting, it looks like using backticks to for code formatting removes all linebreaks for some reason? Then I hit the "comments can only be edited for 5 minutes" limitation...) – StephenB Jun 21 '19 at 19:21
  • It was just intended to be run in SQL Server SSMS as an easy demo to see how certain characters don't encode correctly when converting from UNICODE to ASCII. I don't do any front end coding so I wouldn't be much help creating test code outside of SQL Server. – Jason A. Long Jun 21 '19 at 22:10
  • Ah, gotcha - duh, I should have recognized that as a query. I ran it in SSMS, but I'm not really sure how to interpret the output (I'd paste it, but it's more than the character limit here will allow. It's a header row that looks like this: `n ascii_char unicode_char unicode_to_ascii` And then about a thousand rows that look like this: `35 # # #` – StephenB Jun 22 '19 at 18:08
0

I found a workaround that appears to do the trick: I was previously trying to replace the ? in the code with &#63 (took out the ; so that it will show the code rather than the output), which didn't work. BUT it seems to work if I use &quest instead.

One thing to note, it seemed that I was originally incorrect in thinking that the issue was only affecting content being read/displayed from the MSSQL DB. Rather, it looks like the same problem was also occurring with static content being "echo'd" by code in the ASP scripts (I'm more of a PHP guy, not sure the correct term is for ASP's equivalent to echo is). Though the links that were hardcoded as static (rather HTML being dynamically output by ASP) were unaffected. Though chancing the ? to &quest worked for those ones too (hardest part was tracking down the file I needed to edit).

user692942
  • 16,398
  • 7
  • 76
  • 175
StephenB
  • 91
  • 1
  • 7
  • That is only a workaround *(and a poor one at that)* it doesn't fix the underlying issue that is causing the encoding mismatch in the first place. There are a few things to remember in relation to encoding in Classic ASP; 1. The encoding used to save the ASP page is important and should always match how IIS is going to read it *(this also applies to any `#include` files)* which can be controlled by the `@CodePage` directive at the start of an ASP page. 2. The behaviour of dynamic strings *(echo equivalent know as `Response.Write()`)* can be controlled using the `Response.CodePage` property. – user692942 Jun 22 '19 at 23:36
  • The workaround I clearly described as "a workaround" is... a workaround? You don't say. "Perfect is the enemy of 'good enough'" comes to mind; in the context mentioned in the OP, anything more than band-aid fixes and less than a complete rebuild for that site is just throwing good money after bad. That said, if you have a better suggestion, I'm more than happy to try it - the "codepage" changes were included in the answers you suggested earlier, and as I said they didn't make any difference with the issue on the site I was trying to fix. – StephenB Jun 24 '19 at 14:03
  • In which case, as I said from the very beginning, there is something you aren't explaining if you've followed those steps and are still having issues. Useful bed-time reading - [The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)](https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/) – user692942 Jun 24 '19 at 14:11
  • The specific details that you believe that I've omitted are...? Or is that an assumption? I'm not sure what you think you're accomplishing by arguing the point on an issue where I've already found & detailed a fix that's "good enough" given the circumstances. If you know of a better one (or a *specific* problem with the workaround), I'm more than happy to entertain it. But I'm not going to obsess over finding an "elegant" solution to a fairly minor problem, not for a site that has had much more serious issues found over the years (as in SQL injection flaws, unsecured file upload scripts, etc). – StephenB Jun 25 '19 at 16:08
  • That's fine just wanted other people who come across this question to understand that this isn't how to fix encoding mismatches in Classic ASP. You could easily build a script to modify all the URLs and replace the mismatched character but that doesn't mean you should, better approach is to first learn why "weird characters" are appearing in the db stored URLs and work back from that to identify where the encoding error occurs. This is just one example, what happens when you try to do more elaborate things with UTF-8 using the same code base? You'll find out...or your predecessor will. GL – user692942 Jun 25 '19 at 17:02