0

I want to check if the numbers I have in the list matches specific formatting (nnn.nnn.nnnn). I am expecting the code to return a boolean (FALSE, TRUE, FALSE, TRUE, FALSE, FALSE) but the last element returns TRUE when I want it to be FALSE.

 library(stringr)

 numbers <- c('571-566-6666', '456.456.4566', 'apple', '222.222.2222', '222 333 
 4444', '2345.234.2345')

 str_detect(numbers, "[:digit:]{3}\\.[:digit:]{3}\\.[:digit:]{4}")

If I use:

str_detect(numbers, "[:digit:]{4}\\.[:digit:]{3}\\.[:digit:]{4}")

I get (FALSE, FALSE, FALSE, FALSE, FALSE, TRUE), so I know the pattern for the exact matches work but I am not sure why the first block of code returns TRUE for the last element when there are 4 numbers and not 3 before the '.'

Random Cotija
  • 613
  • 1
  • 10
  • 26
C.Lee
  • 67
  • 7
  • I didn't think either of the nominated questions were an exact match (which is a bit ironic given the question.) This is how to get a pattern match rather than exact match – IRTFM Jul 12 '18 at 19:17
  • @42- This **is** a duplicate of [How to use grep() to find exact match](https://stackoverflow.com/questions/26813667/how-to-use-grep-to-find-exact-match), please reclose. Word boundaries or anchors are all OP needs here. – Wiktor Stribiżew Jul 12 '18 at 19:29
  • I agree that the strategy of using word boundaries is sufficient but the term "exact match" does not in my mind cover patterns that have character classes in them interspersed with periods. The questioner in the earlier question that were cited did not ask for what I would call a patterned match and none of the respondents generalized their answers to cover that possibility. – IRTFM Jul 12 '18 at 19:38

1 Answers1

1

It is because that last value has `345.234.2345' at the end and you don't have a requirement that your pattern start and end with the matching values.

Try this pattern:

"^[:digit:]{3}\\.[:digit:]{3}\\.[:digit:]{4}$"

If you wanted to match with a string possibly inside or one that was separate at the end or beginning by a space it might be more general to use:

"(^|[ ])[:digit:]{3}\\.[:digit:]{3}\\.[:digit:]{4}([ ]|$)"

Testing:

numbers <- c('571-566-6666', '456.456.4566', 'apple', '222.222.2222', '222 333 
 4444', '2345.234.2345', "interior test 456.456.4566 other", 
'456.456.4566 beginning test', "end test 456.456.4566")

 str_detect(numbers, "(^|[ ])[:digit:]{3}\\.[:digit:]{3}\\.[:digit:]{4}([ ]|$)")
#[1] FALSE  TRUE FALSE  TRUE FALSE FALSE  TRUE  TRUE  TRUE

And as Wictor is pointing out you could also use the word boundary operator as long as you double escape it in R patterns.

grepl("\\b[[:digit:]]{3}\\.[[:digit:]]{3}\\.[[:digit:]]{4}\\b", numbers)
[1] FALSE  TRUE FALSE  TRUE FALSE FALSE  TRUE  TRUE  TRUE

Caveat: The stringr functions (which if I remember correctly are based on stringi functions) appear to be different than the "ordinary" R regex functions in that they allow using the special character classes without double bracketing.

  grepl("(^|[ ])[:digit:]{3}\\.[:digit:]{3}\\.[:digit:]{4}([ ]|$)", numbers)
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
  grepl("(^|[ ])[[:digit:]]{3}\\.[[:digit:]]{3}\\.[[:digit:]]{4}([ ]|$)", numbers)
[1] FALSE  TRUE FALSE  TRUE FALSE FALSE  TRUE  TRUE  TRUE

Apparently this is via an implicit setting of "fixed" to TRUE.

IRTFM
  • 258,963
  • 21
  • 364
  • 487