0

I am generating a CSV file from a Microsoft SQL database that was provided to me, but somehow there are invalid characters in about two dozen places throughout the text (there are many thousands of lines of data). When I open the CSV in my text editor, they display as red, upside-down question marks (there are two of them in the attached screenshot).

When I copy the character and view the "find/replace" dialog in my text editor, I see this:

\x{0D}

...but I have no idea what that means. I need to modify my script that generates the CSV so it strips these characters out, but I don't know how to identify them. My script is written in Classic ASP.

Screenshot of invalid characters in text editor

BB Design
  • 695
  • 12
  • 26
  • 1
    It's called an *"encoding mismatch"* and happens when you convert from one encoding to another incorrectly. You need to identify the correct encoding and tell Classic ASP to use that when sending the response. – user692942 Nov 19 '19 at 19:39
  • 2
    Relevant: [convert utf-8 to iso-8859-1 in classic asp](//stackoverflow.com/a/17680939), [Convert UTF-8 String Classic ASP to SQL Database](//stackoverflow.com/a/21914278) and [Understanding How IIS Processes ASP Encoding](https://stackoverflow.com/a/34356187/692942) explain the encoding problem and hwo to fix it. – user692942 Nov 19 '19 at 19:48

2 Answers2

1

You can also use RegEx to remove unwanted characters:

Set objRegEx = CreateObject(“VBScript.RegExp”)
objRegEx.Global = True
objRegEx.Pattern = “[^A-Za-z]”

strCSV = objRegEx.Replace(strCSV, “”)

This code is from the following article which explains in details what it does: How Can I Remove All the Non-Alphabetic Characters in a String?

In your case you will want to add some characters to the Pattern:

^[a-zA-Z0-9!@#$&()\\-`.+,/\"]*$
Étienne Laneville
  • 4,697
  • 5
  • 13
  • 29
  • 2
    Sorry but the fundamental issue is **the encoding is wrong**, that is why they get *invalid* characters. Just giving them a method to strip out those characters doesn't solve the fundamental issue...they don't understand character encoding in Classic ASP. An approach like this will kill any accented characters or other symbols that might exist already in the data. – user692942 Nov 19 '19 at 19:41
  • My plan was to make a list of ALL valid characters, including accented characters and so forth, and strip out everything else. Do you think that would not work? – BB Design Nov 20 '19 at 12:02
-1

You can simply use the Replace function and specify Chr(191) (or "¿" directly):

Replace(yourCSV, Chr(191), "")

or

Replace(yourCSV, "¿", "")

This will remove the character. If you need to replace it with something else, change the last parameter from "" to a different value ("-" for example).

In general, you can use charmap.exe (Character Map) from Run menu, select Arial, find a symbol and copy it to the clipboard. You can then check its value using Asc("¿"), this will return the ASCII code to use with Chr().

Étienne Laneville
  • 4,697
  • 5
  • 13
  • 29
  • 1
    I can try that, but my assumption is that the red, upside-down question mark is just what the text editor displays because it doesn't recognize the character, not because that is the actual character in the text. But yes, the REPLACE command is what I was intending to use, once I can identify the character. – BB Design Nov 19 '19 at 18:50
  • I see. I added a second answer that uses a better approach to the problem I think. – Étienne Laneville Nov 19 '19 at 19:15
  • 1
    @BBDesign It doesn't recognise the character because the Classic ASP script is using the wrong encoding to return the data. – user692942 Nov 19 '19 at 19:45