2

I have a problem when user input a string in special Unicode like and my system cannot distinguish it with the string "tuyendung" that is written in ASCII. The question is how can I normalize the input string to ASCII before storing it in the database?

Sample Input:

(Char code: 0xd835, 0xde01, 0xd835, 0xde02, 0xd835, 0xde06, 0xd835, 0xddf2, 0xd835, 0xddfb, 0xd835, 0xddf1, 0xd835, 0xde02, 0xd835, 0xddfb, 0xd835, 0xddf4)

Expected output:

tuyendung

(Char code: 0x74, 0x75, 0x79, 0x65, 0x6e, 0x64, 0x75, 0x6e, 0x67)

Wai Ha Lee
  • 8,598
  • 83
  • 57
  • 92
Vũ Nhật Anh
  • 512
  • 5
  • 14
  • Does this answer your question? [Replacing accented characters php](https://stackoverflow.com/questions/3371697/replacing-accented-characters-php) – kmoser Jun 04 '20 at 07:07

2 Answers2

4

It looks like the //TRANSLIT option can do the trick here.

<?php

$input = '';
echo iconv('UTF-8', 'US-ASCII//TRANSLIT', $input);

This turns (what I think are?) math symbols like to t

Evert
  • 93,428
  • 18
  • 118
  • 189
0

I don't know what "tuyendung" is.

But in php, you can convert the character sets with the "iconv" function or you can keep the original form in a blob field in the database. You can make any transformation you want in the screening.

Maybe it gives an idea.