Java String contains a special Char but not even one more Char

Question

I am looking for every single URL, which is linked as "eye" in a html Document. I am using a regex pattern, because a simple contains is no solution at this point. So I got a pattern like this

Pattern:: href=\"(https?://)?[a-zA-z0-9?/&=\"+-_\\.# ]*>[Ee]ye

It works... fine... more or less... Because I get more than any URL linked as "Eye" or "eye". I'll get URLs which are linked as "eyebrights" or "eyewears", too, but that's not what I want.

Is there any way to say "get me this and ignore it, when there is more than I want"?

To clarify, you want any URL whose text is exactly `Eye` or `eye`? Can you not match `` after eye? — T. Kiley, Sep 01 '15 at 10:47
Umh... I'm not sure but it sounds... logically. Damn i should have tried something like this. I will try it, thanks! — just_do_IT, Sep 01 '15 at 11:10
Should `eye` be first word in link description or can it be placed in the middle of text like `blue eye`? — Pshemo, Sep 01 '15 at 11:13
eye should be the first word, yes and i tried the solution and it works, but i have some more cases where it's not enough :) So i preferred the \b solution :) — just_do_IT, Sep 01 '15 at 11:25

score 2 · Answer 1 · answered Sep 01 '15 at 11:18

In should try to avoid using regex to parse XML/HTML. Use XML/HTML parser like jsoup instead . With this library our code could look like:

Elements links = doc.select("a[href]:matches(^[eE]ye\\b)");
//Elements extends ArrayList<Element> so you can easily iterate over it

more info at http://jsoup.org/cookbook/extracting-data/selector-syntax

score 1 · Accepted Answer · edited Sep 01 '15 at 12:15

1

Add \b after eye:

href=\"(https?://)?[a-zA-z0-9?/&=\"+-_\\.# ]*>[Ee]ye\\b

\b: assert position at a word boundary.

edited Sep 01 '15 at 12:15

MC Emperor

22,334
15
80
130

answered Sep 01 '15 at 10:57

Kerwin

1,212
1
7
14

1

if want ignore `xxeye` ,then add `\\b` before `[Ee]ye` – Kerwin Sep 01 '15 at 10:59
Thanks, i'll try this! :) – just_do_IT Sep 01 '15 at 11:14

Java String contains a special Char but not even one more Char

2 Answers2