3

Using q’s like function, how can we achieve the following match using a single regex string regstr?

q) ("foo7"; "foo8"; "foo9"; "foo10"; "foo11"; "foo12"; "foo13") like regstr
>>> 0111110b

That is, like regstr matches the foo-strings which end in the numbers 8,9,10,11,12.

Using regstr:"foo[8-12]" confuses the square brackets (how does it interpret this?) since 12 is not a single digit, while regstr:"foo[1[0-2]|[1-9]]" returns a type error, even without the foo-string complication.

cillianreilly
  • 733
  • 4
  • 12
foam78
  • 264
  • 1
  • 8
  • 4
    Question has been incorrectly marked as duplicate. KDB has a very simplified form of regex: https://code.kx.com/q/basics/regex/. `|` isn't supported and only these ranges: `[0-9] [a-z] [A-Z]`. This won't include 10 - 12. For a kdb answer I think this might need to be 2 likes with or: `(l like "foo1[0-2]") or l like "foo[8-9]" 0111110b` – Matt Moore Nov 03 '22 at 20:05
  • 1
    Doc recommends external C libraries for more complicated regex: https://code.kx.com/q/basics/regex/#regex-libraries – Matt Moore Nov 03 '22 at 20:06
  • Thanks Matt, does this mean there is not the functionality to use a single ```regstr```? – foam78 Nov 03 '22 at 23:29

4 Answers4

3

As the other comments and answers mentioned, this can't be done using a single regex. Another alternative method is to construct the list of strings that you want to compare against:

q)str:("foo7";"foo8";"foo9";"foo10";"foo11";"foo12";"foo13")
q)match:{x in y,/:string z[0]+til 1+neg(-/)z}
q)match[str;"foo";8 12]
0111110b

If your eventual goal is to filter on the matching entries, you can replace in with inter:

q)match:{x inter y,/:string z[0]+til 1+neg(-/)z}
q)match[str;"foo";8 12]
"foo8"
"foo9"
"foo10"
"foo11"
"foo12"
cillianreilly
  • 733
  • 4
  • 12
2

As the comments state, regex in kdb+ is extremely limited. If the number of trailing digits is known like in the example above then the following can be used to check multiple patterns

q)str:("foo7"; "foo8"; "foo9"; "foo10"; "foo11"; "foo12"; "foo13"; "foo3x"; "foo123")
q)any str like/:("foo[0-9]";"foo[0-9][0-9]")
111111100b

Checking for a range like 8-12 is not currently possible within kdb+ regex. One possible workaround is to write a function to implement this logic. The function range checks a list of strings start with a passed string and end with a number within the range specified.

range:{
  / checking for strings starting with string y
  s:((c:count y)#'x)like y;
  / convert remainder of string to long, check if within range
  d:("J"$c _'x)within z;
  / find strings satisfying both conditions
  s&d
 }

Example use:

q)range[str;"foo";8 12]
011111000b
q)str where range[str;"foo";8 12]
"foo8"
"foo9"
"foo10"
"foo11"
"foo12"

This could be made more efficient by checking the trailing digits only on the subset of strings starting with "foo".

Thomas Smyth - Treliant
  • 4,993
  • 6
  • 25
  • 36
  • Thank you for the examples, they would be useful in the case of checking for the length of the trailing string of digits. I am interested in matching only a specific range of values which includes numbers of different digit lengths. For example, matching for 8..12 but not <8 or >12. – foam78 Nov 03 '22 at 23:27
  • 1
    Updated my answer to better answer the question. – Thomas Smyth - Treliant Nov 04 '22 at 00:02
2

For your example you can pad, fill with a char, and then simple regex works fine:

("."^5$("foo7";"foo8";"foo9";"foo10";"foo11";"foo12";"foo13")) like "foo[1|8-9][.|0-2]"
Sean O'Hagan
  • 1,681
  • 8
  • 14
2

A variation on Cillian’s method: test the prefix and numbers separately.

q)range:{x+til 1+y-x}.
q)s:"foo",/:string 82,range 7 13 / include "foo82" in tests
q)match:{min(x~/:;in[;string range y]')@'flip count[x]cut'z}
q)match["foo";8 12;] s
00111110b

Note how unary derived functions x~/: and in[;string range y]' are paired by @' to the split strings, then min used to AND the result:

q)flip 3 cut's
"foo" "foo" "foo" "foo" "foo" "foo" "foo" "foo"
"82"  ,"7"  ,"8"  ,"9"  "10"  "11"  "12"  "13"
q)("foo"~/:;in[;string range 8 12]')@'flip 3 cut's
11111111b
00111110b

Compositions rock.

SJT
  • 1,067
  • 5
  • 10