Questions tagged [homoglyph]

A homoglyph is one of several graphemes, characters, or glyphs that cannot be easily visually differentiated.

A homoglyph is one of several graphemes, characters, or glyphs that cannot be easily visually differentiated – for example:




Confusion between homoglyphs can give rise to security concerns.

7 questions
20
votes
2 answers

What Unicode normalization (and other processing) is appropriate for passwords when hashing?

If I accept full Unicode for passwords, how should I normalize the string before passing it to the hash function? Goals Without normalization, if someone sets their password to "mañana" (ma\u00F1ana) on one computer and tries to log in with…
8
votes
2 answers

Homoglyph attack detection in email phishing

Main Question I am working on an API in Java that needs to detect the use of brands (e.g. PayPal, Mastercard etc.) in phishing emails. Obviously there are different strategies that the attackers use to target these brands so that they are harder to…
5
votes
1 answer

Efficient algorithm to find all "character-equal" strings?

How can we write an efficient function that outputs "homoglyph equivalents" of an input string? Example 1 (pseudo-code): homoglyphs_list = [ ["o", "0"], // "o" and "0" are homoglyphs ["i", "l", "1"] // "i"…
Pacerier
  • 86,231
  • 106
  • 366
  • 634
2
votes
2 answers

Allow only letters and digits in strings but without confusables

Say I want usernames to only consist of letters and digits regardless of language. I think I might accomplish this with the following regex parts (?>\p{L}[\p{Mn}\p{Mc}]*) //match any letter, including those consisting of two code points \p{Nd}…
user764754
  • 3,865
  • 2
  • 39
  • 55
1
vote
1 answer

Is there a function to compare two strings using a custom homoglyphs list

I need a function that would compare two strings and outputs an edit distance like Levenshtein, but only if the characters are homoglyphs in cursives. I have a list of those homoglyphs so I could feed a custom list to this…
James McGrath
  • 225
  • 3
  • 11
1
vote
0 answers

replace homoglyph in a php string

I'm working on an anti-spam bot which struggle to decode homoglyphes. Here is a sample message: ɪ ᴄᴀɴ'ᴛ ꜱᴛᴏᴘ ꜱʜᴀʀɪɴɢ ᴛʜᴇ ɢᴏᴏᴅ ɴᴇᴡꜱ ᴀʙᴏᴜᴛ ꜰᴏʀᴇx ᴍᴀʀᴋᴇᴛ ᴄᴏᴍᴘᴀɴʏ. ᴡʜᴇɴ ɪ ꜰɪʀꜱᴛ ʜᴇᴀʀᴅ ɪᴛ, ɪ ᴡᴀꜱ ᴀꜰʀᴀɪᴅ ʙᴜᴛ ʟᴀᴛᴇʀ ꜱᴜᴍᴍᴏɴᴇᴅ ᴄᴏᴜʀᴀɢᴇ ᴀɴᴅ ᴍᴀᴅᴇ ᴀ ᴍᴏᴠᴇ ᴡɪᴛʜ $200 ɪ…
Math
  • 666
  • 8
  • 26
-2
votes
1 answer

Homoglyphs REGEX detection and sql collation

I have a table containing some regexes. By default the table was created using utf8mb4_general_ci collation. Everything is fine until i try to add a regex containing homoglyphes like this one. The regex // once stored in my database will simply…
Math
  • 666
  • 8
  • 26