3

I would like to know is there anyway to check two strings are similar or not. My strings have same but difference in byte length and urlencodde .

$unit1 = '㎏';

$unit2 = 'kg';

strlen($unit1); // 3

strlen($unit2); // 2

urlencode($unit1); // %E3%8E%8F

urlencode($unit2); // kg
ToujouAya
  • 593
  • 5
  • 24
  • Is **$unit1** having some special character? – Prateik Darji Dec 11 '17 at 10:46
  • @PrateikDarji . I think no, $unit1 was input from japanese input keyboard and it's half-width. – ToujouAya Dec 11 '17 at 10:48
  • 3
    These strings only *look* the same in your chosen font (most fonts, in fairness). they're actually very different, as shown by the url encoding. If you know what unusual characters you'll encounter you could translate them on a case by case basis, but other than that you're out of luck. https://stackoverflow.com/questions/39948627/how-to-compare-strings-in-which-appears-similar-characters-but-different-char-co That question explores a similar problem in javascript. – Simon Brahan Dec 11 '17 at 10:48
  • @SimonBrahan Well, it seem that there is no way to detect, so I think i need to translate them on a case by case. Anyway thank you – ToujouAya Dec 11 '17 at 10:53
  • 1
    You _could_ try to define a mapping per symbol. But then again, it certainly will fail on different (and/or specific) fonts. If I was in your shoes - I would definitely start with proper definition of similarity in this case and narrowing my boundaries so that the task will be well-constrained and therefore possible to be resolved. – Alma Do Dec 11 '17 at 12:02
  • @AlmaDo I am doing like this. Anyway thank for your suggestion – ToujouAya Dec 12 '17 at 02:05
  • Possible duplicate of [Comparing two unicode strings in PHP](https://stackoverflow.com/questions/6855425/comparing-two-unicode-strings-in-php) – ventaquil Dec 20 '17 at 14:59

1 Answers1

0

As this is currently a security Problem, there should be lists around..

One I found in this answer: Find similar ASCII character in Unicode

http://www.unicode.org/Public/security/latest/confusables.txt

Some info about it on wikipedia: https://en.wikipedia.org/wiki/Homoglyph

Another link https://github.com/codebox/homoglyph

user5542121
  • 1,051
  • 12
  • 28