-2

Why does the following code evaluate differently?

For example:

str_detect(c('12312312','PO123123', 'ABCDDBCA'),'((\\d)|(A-Z)){8}')

evaluates to

TRUE FALSE FALSE

While

str_detect(c('12312312','PO123123', 'ABCDDCBA'),'[\\d|A-Z]{8}')

evaluates to

TRUE TRUE TRUE

I understand that the '''|''' character in the second expression is not needed, as within square brackets it is treated as a literal character rather than the or symbol but nonetheless I don't understand why the first regex doesn't evaluate to TRUE TRUE TRUE

Thank you

I expected both to evaluate to TRUE TRUE TRUE

JonP
  • 13
  • 3

1 Answers1

4

((\\d)|(A-Z)){8} matches exactly 8 groups, where every group has to be either digit, or literally A-Z. Here is demo what will be matched by your first regex.

(...) is a group in regex and it matches pattern specified inside.
[...] is a symbol class, and it matches any of symbol specified inside: as symbols(for example abc) or range of symbols (for example a-c).

Most likely you wanted to write (\\d|[A-Z]){8}. In that case, it will match 8 digits or uppercase letters. And simplified this expression will be [\\dA-Z]{8}.

markalex
  • 8,623
  • 2
  • 7
  • 32