30

The symbol is: ؤْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْ

What's so special about this symbol and where did it come from?

What can be done to validate against such input? Or even better, how can such symbols be displayed properly (i.e. not letting them overlap over other elements) ?

MD XF
  • 7,860
  • 7
  • 40
  • 71
chaosifier
  • 2,666
  • 25
  • 39
  • what's with the line? or just on my screen? – Drixson Oseña Dec 18 '15 at 06:50
  • 5
    @DrixsonOseña:- I guess that's what OP is asking! It's there on my screen as well – Rahul Tripathi Dec 18 '15 at 06:52
  • 1
    @RahulTripathi I had no idea :) – Drixson Oseña Dec 18 '15 at 06:52
  • 1
    That is some kind of modifier of a sign, normally you would just use one, but you can make crazy combinations. e.g. you could enter the letter ä directly or with an a and that double point modificatior.. – rekire Dec 18 '15 at 06:54
  • 1
    @chaosifier:- May be because you have not mentioned from where you get this symbol? Whats the source....etc(Not the downvoter btw) – Rahul Tripathi Dec 18 '15 at 06:54
  • 8
    Guys on 9gag have been using this symbol for a while because of its weird behavior. I tried to find more about it on google but google replied with a 400 error. So i had to post this question here. – chaosifier Dec 18 '15 at 06:57
  • 6
    _but google replied with a 400 error_ - that's kinda interesting in itself! I wonder why that happens – Krease Dec 18 '15 at 07:00
  • 1
    If you paste this on password input it paste a lot of things. Tried pasting in facebook and they wont accept it , youtube won't paste it :D – Drixson Oseña Dec 18 '15 at 07:00
  • 2
    @chaosifier: How you used it on 9GAG? since google is replying 404 error. For a change I checked with Bing. It returns result on some _Arebic_ character. Not sure though if it matches your purpose of using it. +1 for interesting question. – MKay Dec 18 '15 at 07:12
  • 2
    @mk08, They use it on the comments section. Thank you. – chaosifier Dec 18 '15 at 07:17
  • 7
    It's beautiful. ؤْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْ – ASA Dec 18 '15 at 13:17
  • 3
    Related: [How does Zalgo text work?](http://stackoverflow.com/questions/6579844/how-does-zalgo-text-work) – CodesInChaos Dec 18 '15 at 13:40
  • @Traubenfuchs It's f***ed up :D ؤْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْ‌​ْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْ – Ahmad Alfy Dec 30 '15 at 14:47

5 Answers5

20

Well since it seems to be not as trivial as I thought for others here is my answer.

This is called Combining Diacritical Marks.

To give you an example you can write a ä directly or as ä which results in "ä".

Now you can mess up with that signs like here: "ä̈̈̈̈̈̈", here I entered: ä̈̈̈̈̈̈

To protect yourself to such "unicode" attacks you could limit the count of unicode chars which are allowed to come after each other. I cannot give you an exact example since you tags don't give a hint about your server side language. If you have a plain english website you might try to limit it to ascii chars only. However I would not recomment that, since I would be not allowed to sign then with my name :-)

I would just limit the count of Unicode characters after each other. That might been done with regex.

If you just want to avoid that the Unicode characters "break out" of their container try using style="overflow:auto" which seems to limit the way how it is rendered.

rekire
  • 47,260
  • 30
  • 167
  • 264
  • 3
    I didn't recognized before that this is displayed differently in other browsers. If Rahul Tripathi is right and this special char is an arabic one (I didn't invest to check this special one), I could imagine that some browers/operating systems don't have installed the support for arabic chars, so I would guess it is a bug in this case or a missing support. – rekire Dec 18 '15 at 07:28
  • 1
    Since i had some more questions, and also since some members were saying that this question was not programming related, i had to update the question and uncheck your answer. Sorry for the inconvenience caused, i should have included everything in the beginning. – chaosifier Dec 18 '15 at 09:13
  • 1
    @chaosifier now you have a solution in my answer how to fix it :) – rekire Dec 18 '15 at 09:20
  • Is that how Facebook handles such input? Can the overlapping nature of the symbol be stopped without having to validate the input i.e. by using HTML/CSS alone? – chaosifier Dec 18 '15 at 09:27
  • @chaosifier:- By HTML/CSS, I dont think, you can use some other language like Javascript to validate it – Rahul Tripathi Dec 18 '15 at 09:28
  • 1
    @chaosifier I don't use Facebook so no idea. However the edit of " 一二三" (*this is 123 I guess*) gives you a hint put it into a div with `overflow:auto`. – rekire Dec 18 '15 at 09:29
  • 2
    @rekire, that worked like a charm. I think you could include that in your answer. I have rolled back your changes to the question to allow others to see the problem. Thanks a lot for your answer, really appreciate it. – chaosifier Dec 18 '15 at 09:47
8

I just copied the symbol to SQL Server and Visual Studio and found that the symbol got converted to

enter image description here

So it looks like the combination of ْ (which looks like an Arabic symbol)symbol which the browser is not able to recognize.

The symbol is Arabic Hamza symbol.

Also the same symbol is interpreted correctly by IE.

enter image description here

So it looks like that some browsers are not able to recognize the symbol.

EDIT:

To validate such input usually you can use some sort validation(like to restirct user to enter only ASCII characters) using languages like Javascript or PHP through which you can restrict the user to input the characters as per your choice.

Or even better, how can such symbols be displayed properly

If the browser cannot render the symbol as the one you have shown then as a workaround you can put some limit on those characters like put them inside a div with overflow:auto but that would not be a good solution. A better one would be to use a validation script.

Rahul Tripathi
  • 168,305
  • 31
  • 280
  • 331
  • 1
    Why do you think that IE is correct and firefox (which produces the line) is wrong? I'm not an expert for arabic, but my first guess would have been the other way round. The line seems like the logical consequence of stacking combining marks. – CodesInChaos Dec 18 '15 at 13:34
5

It strange that, on screen you will see only 1 character followed by a line drawn from nowhere.

But when inspected with chrome, It is actually characters with 1st character having Unicode 1572, followed by 161 characters that draws line having Unicode 1618 ! And after that there is Unicode (or ASCII code) 32 for space.

Sachin
  • 2,152
  • 1
  • 21
  • 43
  • "(Unicode) code point", not "ASCII code". – Sebastian Negraszus Dec 18 '15 at 11:27
  • True but limited. 1572 & 1618 are ASCII code (decimal system) and if you convert those two into hex you get 624 & 652 respectively. Now use `;` and you will see magic. So `ؤ` is a Unicode to a first character that you see in the question and `ْ` is Unicode to rest of 161 characters... :D – Sachin Dec 18 '15 at 12:33
  • 4
    ASCII vs Unicode has nothing to do with decimal vs hexadecimal. ASCII is a 7-bit character set, so the largest code point is 127; there is no "ASCII code" (code point) 1572. You are talking about another character set, Unicode, so the term "ASCII" is not correct. – Sebastian Negraszus Dec 18 '15 at 12:47
  • Yes, that's true. Unicode is superset of ASCII. I have read http://stackoverflow.com/a/19212345/1659563 ... Thanks for correcting me.. – Sachin Dec 18 '15 at 12:53
2

I am not sure if parsing your symbols in Javascript is gonna be helpful but here is a script that does that:

var text = 'your symbol goes here',
regex1 = /(?:[\u0624|\u0652])/g,
result;
// note that the symbol comprises of the letter and the repeated diacritics;
// to remove the symbol completely: 
result = text.replace( regex1, '');

Here is a way to see what kind of characters are included in the symbol and how these chars made it looked very weird (it’s using javascript regex):

https://regex101.com/r/yW4aM8/3

You may wanna use meta tag: charset=UTF-8 to render the entire symbol correctly on all browsers than trying it only on IE. I would say the only reason your symbol looks weird is because the diacritics (the repeated chars) are not used correctly, otherwise, the chars included are all legit. I wouldn’t really be surprised if this symbol is just someone trying to misuse a form input or something for the same effect.

The symbol is using pure Arabic characters, and just for you to know the range of this language’s characters in the unicode are as follows (javascript regex) and available at unicode.org:

/[\u0600-\u06FF]/g

/[\u0600-\u06FF]/g.exec( ‘text here’ );

// it's advised that you wrap the Arabic words in spans to control and show them correctly, do the following:
'text includes arabic words'.replace(/(?:([\u0600-\u06FF]+))/g, '<span class="xyz">$1</span>';

and the css would be:

.xyz { unicode-bidi: bidi-override; }

I hope that helps a bit. good luck.

KQI
  • 322
  • 1
  • 16
0
$ echo -n ؤْْ | recode utf8..dump
UCS2   Nem   Descripción

0624   wH    arabic letter waw with hamza above
0652   0+    arabic sukun
0652   0+    arabic sukun
0652   0+    arabic sukun
[...lots of repeated lines...]
0652   0+    arabic sukun

That's the arabic waw (w) with a lot of diacritics: 1 hamza (precomposed as the character waw with hamza above) and about 160 repeated sukun diacritics.

ninjalj
  • 42,493
  • 9
  • 106
  • 148