0

I am implementing a javascript code which makes hashtag linkable as follows -

str2 = str.replace(/(^|\s)#([A-Za-z0-9é_ü]+)/gi, '$1<a href="https://www.facebook.com/hashtag/$2" class="msfb-wall-auto-link" target="_blank">#$2</a>'); 

if you see i included special hungarian characters like é , ü ... to be included in the hashtag linking but above code break at those special hungarian chars. But when i test that in w3schools.com example code editor things work there. So in my local script file those special chars are not being recognized as a character(é) but look like it's being treated as "e" character. Why this is happening ? how to overcome this problems, please suggest ideas.

dev-m
  • 440
  • 8
  • 24
  • I'm [not able to replicate](https://jsfiddle.net/z9L5nzp1/) what you appear to be saying, can you reproduce the issue anywhere we can see, ideally as a [stacksnippet](https://blog.stackoverflow.com/2014/09/introducing-runnable-javascript-css-and-html-code-snippets/) or [jsfiddle](https://jsfiddle.net/)? – James Thorpe Jan 08 '16 at 16:34

3 Answers3

1

Look here and here. Javascript has some problems with Unicode in regexp.

If you want to match every Unicode letter, you should use this regexp [\u00C0-\u1FFF\u2C00-\uD7FF\w].

So your code should look like this:

str2 = str.replace(/(^|\s)#([\u00C0-\u1FFF\u2C00-\uD7FF\w]+)/gi, '$1<a href="https://www.facebook.com/hashtag/$2" class="msfb-wall-auto-link" target="_blank">#$2</a>'); 

var str2 = 'abc #łążaf3234 efg'.replace(/(^|\s)#([\u00C0-\u1FFF\u2C00-\uD7FF\w]+)/gi, '$1<a href="https://www.facebook.com/hashtag/$2" class="msfb-wall-auto-link" target="_blank">#$2</a>'); 

alert(str2);
Community
  • 1
  • 1
Are
  • 2,160
  • 1
  • 21
  • 31
  • Can you guys plz add examples in my code?? having prob in understanding. – dev-m Jan 08 '16 at 16:48
  • Look like your code working but links you provided i am not getting till now. Does your regexp code also counts those characters which i mentioned in my code(A-Za-z0-9_). Are there any disadvantages can occur ?? – dev-m Jan 09 '16 at 14:13
  • There? please answer my question above – dev-m Jan 11 '16 at 00:10
  • @dev-m you need to add `0-9` and `_` to the above regex. I've only shown you how to match every unicode letter. – Are Jan 12 '16 at 12:47
0

You have to list the special characters [A-Za-z0-9éüíóþæöÉÚÍÓÞÆÖ] (these are icelandic characters) or you could use \S to match any non-whitespace character

Thorgeir
  • 3,960
  • 3
  • 24
  • 20
  • Can you guys plz add examples in my code?? having prob in understanding. – dev-m Jan 08 '16 at 16:50
  • I dont think the "é_ü" works in regex, you have to list every special char in the hungarian alphabet [a-z0-9áéëéíóöúü] the "i" in "/ig" makes this in-case-sensitive so you don't have to have the uppercase version. This works for icelandic at least, – Thorgeir Jan 11 '16 at 09:52
  • Even if i include all Hungarian alphabets the behavior is same as i mentioned in my code – dev-m Jan 11 '16 at 12:56
  • Make sure you save your file with utf-8 or unicode encoding – Thorgeir Jan 11 '16 at 15:40
-1

Your best bet is to use unicode escape sequences (like \u2665) rather than the binary character.

John Hascall
  • 9,176
  • 6
  • 48
  • 72
  • Explain to me why this is wrong. His regex works typed into an online form, but if he saves his script into a local file, the wide-characters get mangled into 8-bit characters. Using unicode escapes like \uXXXX is the solution. – John Hascall Jan 08 '16 at 16:55
  • You mean like \u2665 ?? – John Hascall Jan 09 '16 at 17:04