3

I want to match only 911 or 1911 from a string with any number of preceding or ending * or #.

My Regex:

[^0-9]\*[1-9]{3,4}[^0-9]*

Test code below returns true when i was expecting it to be false:

select Digits
from (select '*11911#' as Digits) A
where Digits rlike '[^0-9]\*[1-9]{3,4}[^0-9]*'

What am I doing wrong?

leftjoin
  • 36,950
  • 8
  • 57
  • 116
usyed
  • 51
  • 1
  • 3
    I don't see how your regex corresponds to your requirements. It doesn't even have a `#`. – shmosel Jun 25 '21 at 00:40
  • 1
    I recommend reading up on regex syntax and using a tool like [regex101](https://regex101.com/r/HYSFVB/1) to explain the current regex and possibly create a correct one for the given use case. – Turing85 Jun 25 '21 at 00:42
  • 1
    If I'm reading the requirement right, the regex would look something like `[*#]*1?911[*#]*`. – shmosel Jun 25 '21 at 00:42
  • A little [test case on regex101](https://regex101.com/r/UfTFLd/1) to @shmosel's proposed solution (I took the liberty to include start-of-string and end-of-string validation). – Turing85 Jun 25 '21 at 00:44
  • can you not do `INSTR (col, '911')` ? Do you have to use regex? your requirement looks simple. – Koushik Roy Jun 25 '21 at 04:40
  • Answered: it was two problems: incorrectly escaped * and extra * at the end – leftjoin Jun 25 '21 at 07:29

1 Answers1

0

BTW when escaping in Hive, you should use double-backslash: \\* or use [*], to avoid unpredicted behavior (sometimes single backslash works as escaping, sometimes not, double backslash always works as escape in Hive).

This '[^0-9]\\*[1-9]{3,4}[^0-9]*' - does not match, * correctly escaped and you have nothing before * in the string.

Let's remove [^0-9] before \\* and check again:

This returns no rows:

select Digits
from (select '*11911#' as Digits) A
where Digits rlike '\\*[1-9]{3,4}[^0-9]'

Also this '\\*[1-9]{3,4}[^0-9]+' does not match

And this matches:

'\\*[1-9]{3,4}[^0-9]*' 

Because * at the end means 0 or more times, it matches perfectly: there are 4 [1-9] and zero non digits in a row.

On regex101 it works the same: last * makes it matching

leftjoin
  • 36,950
  • 8
  • 57
  • 116