0

I want to test if a string contains a word or not. So, I have this regex expression:/\bde\b/gi

And, if my string is "Comida de cão", it works.

But, if I have a string like "Necessidade de adeus depois " it also matches the "de" in "necessidade", "adeus" and "depois".

Besides, when I try to match words with accents in a string like "é a vida", using the regex like this: /\bé\b/gi nothing is found. But if I search for a word with an accent in the middle it is found! So in the string "O nível" if I use the following regex expression /\bnível\b/gi it matches the right word.

I've been searching similar issues but I still didn't manage to solve my problem.

Btw, here the first issue doesn't happen and it works as expected.

Thanks!

Edit: Added my code

var myRe = new RegExp("\\b" + query + "\\b","iu");
var match = myRe.test("Necessidade de adeus depois");
ninjacow55
  • 27
  • 1
  • 2
  • 10
  • Just tried on [Regex101](https://regex101.com/) and I can't reproduce, I get correct results – ctwheels Aug 31 '17 at 15:23
  • 2
    The problem is most likely your code, which you're supposed to show. –  Aug 31 '17 at 15:23
  • Can you post the code where you're using it? – intrepid_em Aug 31 '17 at 15:25
  • 1
    cannot reproduce the first problem too, works here. For the second, there's a problem with `\b` and unicode chars, [see here](https://stackoverflow.com/questions/10590098/javascript-regexp-word-boundaries-unicode-characters). By the way it can be good to use `u` option for unicode chars supprt – Kaddath Aug 31 '17 at 15:31
  • @Kaddath is correct about using the `u` option for your second issue `\bé\b/giu` works – ctwheels Aug 31 '17 at 15:32
  • @ctwheels As in the link I posted, it also works fine for me on regex101. I already added my code – ninjacow55 Aug 31 '17 at 15:36
  • @Kaddath tried adding the u but it didn't solve it – ninjacow55 Aug 31 '17 at 15:37
  • @ChrisG I already edited and added the code – ninjacow55 Aug 31 '17 at 15:52
  • It works, use python flavor – Abr001am Nov 23 '17 at 21:21
  • @Idle how can I do that? – ninjacow55 Nov 25 '17 at 11:01
  • @ninjacow55 javascript is known to have unicode problems, italian and french and spanish like languages are likely not regex parsable, I was just pointing that your regex syntax is fine, it's not using it in JS, the solution is below. – Abr001am Nov 25 '17 at 16:25
  • @Idle well, yeah, below they made it work when I'm using pure JS, but I'm using a framework called Vue.js and so it doesn't work properly... Do you know anything about this? I've been searching the web and I didn't find anything... – ninjacow55 Nov 28 '17 at 12:33

1 Answers1

1

The closest to a working thing that I have found is this. Like stated in my comment, there seem to be a problem with word boundaries and unicode characters.

This solution can be improved i think, but it uses a positive lookahead (that doesn't consume the characters) to test either if start ^ or end $ of string, or if not a word character:

//accent as a word end or start
/(?=^|\W)é(?=$|\W)/giu

//no accent as a word end or start
/\bnível\b/giu

EDIT: yes that's true, does not work with multiple chars.. if you can test the length of what you want to test, you can still make different cases depending if you search for 1 or multiple chars

EDIT2: actually last edit is wrong. It doesn't depend on the length but if the accented char is near the boundary or not. so it would be /(?=^|\W)éternel\b/giu for "éternel" and /\bné(?=$|\W)/giu for "né"

updated regex example: https://regex101.com/r/6v2gId/3

EDIT3: a little example of what i tried, to answer your last comment:

var query = 'de';
var myRe = new RegExp("\\b" + query + "\\b","giu");
var match = myRe.test("determinado de necessidade de comer é de");
document.getElementById('res1').innerHTML = match;
var match = myRe.test("determinado necessidade comer é e");
document.getElementById('res2').innerHTML = match;
var query = 'dé';
var myRe = new RegExp("\\b" + query + "(?=$|\\W)","giu");
var match = myRe.test("déterminado dé necessidadé de comer é de");
document.getElementById('res3').innerHTML = match;
var match = myRe.test("déterminado necessidadé comer é de");
document.getElementById('res4').innerHTML = match;
<span>test with "\\bde\\b":</span><br/>
<span>for "determinado de necessidade de comer é de":</span><span id="res1"></span><br/>
<span>for "determinado necessidade comer é e":</span><span id="res2"></span><br/><br/>
<span>test with "\\bdé(?=$|\\W)":</span><br/>
<span>for "déterminado dé necessidadé de comer é de":</span><span id="res3"></span><br/>
<span>for "déterminado necessidadé comer é de":</span><span id="res4"></span>
Kaddath
  • 5,933
  • 1
  • 9
  • 23
  • Thanks for your answer! It does work for a single character with accent, but doesn't work for anything else, if you write "vida" or "bvéb" it won't be found :( – ninjacow55 Aug 31 '17 at 16:09
  • edited the answer, it would work if you can test the length of what you search, verifications should be made though – Kaddath Aug 31 '17 at 16:31
  • It's working for the second issue, but the first issue persists. Don't know if using a Regex object alters anything, I can't see why this is happening if on regex101 works just fine – ninjacow55 Sep 01 '17 at 21:51
  • I just tested with your code with setting manually `var query = 'de';`, with test string `"determinado de necessidade de comer é e de"` and the match was true. Can it be what's inside `query` that's wrong, or bad formatted? where do you get its value from? (EDIT: do you use a framework?) – Kaddath Sep 05 '17 at 08:55
  • The match was true only for "de" or for "necessidade" also? I get the value from user input, yes I'm using Vue.js – ninjacow55 Sep 15 '17 at 11:01
  • it was only true for "de", i added the testing code i have done for this, it seems to work quite well here, the match is true only when the whole word is found, i don't know vue.js enough to tell, have you tried to display the raw user input to see if it is formatted in any kind? – Kaddath Sep 15 '17 at 13:28
  • Yup, it works fine if not using Vue.js... If I console.log the user input, it appears ok. What a weird situation – ninjacow55 Sep 20 '17 at 09:42