0

I have a website in Classic ASP. I need to export some data from a database to CSV files.

Some of the data (from the database) is in numeric character reference (NCR). These characters all start with the characters "&#". These are actually Chinese characters represented in NCR (i.e. a number string spelling out the unicode character number). For example: 香辣猪

How do I decode these Chinese characters which are currently in NCR to their actual Chinese characters (maybe in unicode format), in the exported CSV file, so that when I open up this file in Excel or Google Sheets, these Chinese characters will show up properly (display the actual Chinese characters)?

For example 香辣猪 should be actually displayed as 香辣猪

In Excel, I can actually use the following to do the conversion:

=UNICHAR(39321)&UNICHAR(36771)&UNICHAR(29482)

But, I would like to pre-convert those NCR to unicode when exporting to CSV. Is there a way to do this? What is the equivalent of UNICHAR in Classic ASP?

TylerH
  • 20,799
  • 66
  • 75
  • 101
J K
  • 519
  • 5
  • 9
  • 28
  • The following may be of relevance : https://stackoverflow.com/questions/1856239/classic-asp-how-to-write-unicode-string-data-in-classic-asp and https://www.hanselman.com/blog/internationalization-and-classic-asp – Jon P Apr 04 '22 at 04:54
  • Thanks, but I need to convert the numbers to actual Chinese characters in unicode (or another encoding), which Excel or Google Sheets can directly display. – J K Apr 04 '22 at 05:26
  • you can use `String.fromCharCode(txt.match(/(\d+);/)[1])` to decode – Mike Steelson Apr 04 '22 at 06:55
  • 1
    Please note that I am using classic ASP, which doesn't support such functions. – J K Apr 04 '22 at 07:34
  • @JonP Nope, they just need to decode the HTML encoding and make sure the page response charset is `UTF-8`. – user692942 Apr 04 '22 at 11:06
  • @JK can classic ASP support substring and split javascript functions ? – Mike Steelson Apr 04 '22 at 14:01
  • @MikeSteelson Classic ASP is server-side post processing language. It supports active scripting languages like VBScript and JScript (ECMAScript v3). – user692942 Apr 04 '22 at 15:12

1 Answers1

-1

In google sheet, if you want to upload a csv file whith html special characters, try

function importCsvFromIdCodeHtml() {
  var id = '13tlu9eYb5Ty3L45_RKibsfHjOXyUxeX3'; // adapt the id to your own file id
  var csv = DriveApp.getFileById(id).getBlob().getDataAsString();
  var csvData = Utilities.parseCsv(csv);
  csvData.forEach((rng, row) => {
    rng.forEach((r, col) => {
      code = (ExtractAllRegex(r, '&#([0-9]+);', 1))
      code.forEach(function (c) { r = r.replace(`&#${c};`, String.fromCharCode(c)) })
      csvData[row][col] = r
    })
  });
  var f = SpreadsheetApp.getActiveSpreadsheet().getActiveSheet();
  f.getRange(1, 1, csvData.length, csvData[0].length).setValues(csvData);
}
function ExtractAllRegex(input, pattern,groupId) {
  return Array.from(input.matchAll(new RegExp(pattern,'g')), x=>x[groupId]);
}

  var txt = '香'
  var char = String.fromCharCode(txt.match(/&#(\d+);/)[1]);
  console.log(char)
Mike Steelson
  • 14,650
  • 2
  • 5
  • 20
  • What does this have to do with server-side processing in Classic ASP? – user692942 Apr 04 '22 at 11:04
  • nothing with ASP ...but the demand is not very specific and among all sentences, J K ask `in the exported CSV file, so that when I open up this file in Excel or Google Sheets, these Chinese characters will show up properly` thx to not downvote ... I add also a snippet to show how to transform characters – Mike Steelson Apr 04 '22 at 13:54
  • i think that it could be interested to keep the format xxxx; in the server and decode when necessary in excel or google sheets ! – Mike Steelson Apr 04 '22 at 14:07
  • 1
    HTML character encoding is what people use when they don’t understand how to encode data properly. There is no reason to store unicode data in a HTML encoded string in a modern DB. – user692942 Apr 04 '22 at 16:32