0

Let's say I want a Regex expression that will only match numbers between 18 and 31. What is the right way to do this?

I have a set of strings that look like this:

"quiz.18.player.total_score" 
"quiz.19.player.total_score" 
"quiz.20.player.total_score" 
"quiz.21.player.total_score"

I am trying to match only the strings that contain the numbers 18-31, and am currently trying something like this

(quiz.)[1-3]{1}[1-9]{1}.player.total_score

This obviously won't work because it will actually match all numbers between 11-39. What is the right way to do this?

Parseltongue
  • 11,157
  • 30
  • 95
  • 160
  • Possible duplicate of [Why doesn't \[01-12\] range work as expected?](https://stackoverflow.com/questions/3148240/why-doesnt-01-12-range-work-as-expected) – acylam Sep 06 '18 at 18:53

3 Answers3

3

Regex: 1[89]|2\d|3[01]

For matching add additional text and escape the dots:

quiz\.(?:1[89]|2\d|3[01])\.player\.total_score

Details:

  • (?:) non-capturing group
  • [] match a single character present in the list
  • | or
  • \d matches a digit (equal to [0-9])
  • \. dot
  • . matches any character
Srdjan M.
  • 3,310
  • 3
  • 13
  • 34
  • Ah! This is great. Can you explain why you used a non-capturing group? – Parseltongue Sep 06 '18 at 01:28
  • 2
    @Parseltongue I guess it's not strictly necessary, but should be more efficient: https://stackoverflow.com/questions/3512471/what-is-a-non-capturing-group-what-does-do basically, you're using the grouping operator `()` only incidentally and not actually trying to group, just trying to combine some patterns with `|`. So you can deactivate the grouping logic with `?:` – MichaelChirico Sep 06 '18 at 02:06
2

!) If s is the character vector read the fields into a data frame picking off the second field and check whether it is in the desired range. Put the result in logical vector ok and get those elements from s. This uses no regular expressions and only base R.

digits <- read.table(text = s, sep = ".")$V2
s[digits %in% 18:31]

2) Another approach based on the pattern "\\D" matching any non-digit is to remove all such characters and then check if what is left is in the desired range:

digits <- gsub("\\D", "", s)
s[digits %in% 18:31]

2a) In the development version of R (to be 3.6.0) we could alternately use the new whitespace argument of trimws like this:

digits <- trimws(s, whitespace = "\\D")
s[digits %in% 18:31]

3) Another alternative is to simply construct the boundary strings and compare s to them. This will work only if all the number parts in s are exactly the same number of digits (which for the sample shown in the question is the case).

ok <- s >= "quiz.18.player.total_score" & s <= "quiz.31.player.total_score"
s[ok]
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
1

This is done using character ranges and alternations. For your range

3[10]|[2][0-9]|1[8-9]

Demo

wp78de
  • 18,207
  • 7
  • 43
  • 71