3

Good day,

My java code is as follow:

Pattern p = Pattern.compile("^[a-zA-Z0-9$&+,:;=\\[\\]{}?@#|\\\\'<>._^*()%!/~\"`  -]*$");
String i = "f698fec0-dd89-11e8-b06b-☺";
Matcher tagmatch = p.matcher(i);
System.out.println("tagmatch is " + tagmatch.find());

As expected, the answer will be false, because there is ☺ character inside. However, I would like to show the column number that not match. For this example, it should show column 25th having the invalid character.

May I know how can I do this?

VLAZ
  • 26,331
  • 9
  • 49
  • 67
Panadol Chong
  • 1,793
  • 13
  • 54
  • 119

1 Answers1

4

You should remove anchors from your regex and then use Matcher#end() method to get the position where it stopped the previous match like this:

String i = "f698fec0-dd89-11e8-b06b-☺";
Pattern p = Pattern.compile("[\\w$&+,:;=\\[\\]{}?@#|\\\\'<>.^*()%!/~\"`  -]+");
Matcher m = p.matcher(i);
if (m.lookingAt() && i.length() > m.end()) { 
   System.out.println("Match <" + m.group() + "> failed at: " + m.end());
}

Output:

Match <f698fec0-dd89-11e8-b06b-> failed at: 24

PS: I have used lookingAt() to ensure that we match the pattern starting from the beginning of the region. You can use find() as well to get the next match anywhere or else keep the start anchor in pattern as

"^[\\w$&+,:;=\\[\\]{}?@#|\\\\'<>.^*()%!/~\"`  -]+"

and use find() to effectively make it behave like the above code with lookingAt().

Read difference between lookingAt() and find()

I have refactored your regex to use \w instead of [a-zA-Z0-9_] and used quantifier + (meaning match 1 or more) instead of * (meaning match 0 or more) to avoid returning success for zero-length matches.

Arvind Kumar Avinash
  • 71,965
  • 6
  • 74
  • 110
anubhava
  • 761,203
  • 64
  • 569
  • 643
  • 1
    Or just keep the start anchor. – shmosel Jan 18 '23 at 06:16
  • 1
    Note that `lookingAt()`/`find()` will always return true because the `*` matches zero-length strings. You'll want to compare `end()` to `i.length()` to check if it's a full match or not. – shmosel Jan 18 '23 at 06:20
  • Good point, i have now suggested to use `+` instead of `*` – anubhava Jan 18 '23 at 06:24
  • Your code doesn't print whether the expected pattern was matched. If there is an unexpected character at the beginning, nothing is output. Otherwise, it prints something, but I don't know if the whole thing matched. – モキャデ Jan 18 '23 at 06:42
  • @anubhava , is this one possible applicable on the opposite site? meaning I put $ at behind my regex and wan to make the .find() is false. and only find out which column giving problem. – Panadol Chong Jan 18 '23 at 06:43
  • If you put `$` then `find()/lookingAt()` will fail and then you can't use `end()` method as calling that would throw exception `ava.lang.IllegalStateException: No match available` – anubhava Jan 18 '23 at 06:55
  • @モキャデ: I have added `else` block to print `No match!!` for failure case. – anubhava Jan 18 '23 at 06:56
  • 1
    Only if you consider a partial match success, which doesn't appear to be OP's intent. See my previous comment. – shmosel Jan 18 '23 at 07:03
  • @shmosel: I get your point, have added length check in answer. – anubhava Jan 18 '23 at 07:13