4

I try to split a string using strsplit(str, '[,-\\+]'), which means any ',', '-' or '+' could be a delimiter. However, I found this pattern seems to also match numbers and capital letters. Try

  • grep('[,-\\]', 'X'), returns 1
  • grep('[,-\\]', '46'), returns 1
  • grep('[,-\\]', '-'), returns 1
  • grep('[,-\\]', ','), returns 1

It seems to be '[,-\\]' matching all numbers, capital letters, ',' and '-'.

I just don't get why this is the case.

Thank you for any input

Sotos
  • 51,121
  • 6
  • 32
  • 66
Li Sun
  • 107
  • 8
  • If I change the order of the pattern like '[\\+,-]', then this only matches '+', ',' and '-'. I am confused – Li Sun Aug 01 '17 at 13:37
  • Possible duplicate of [How to match hyphens with Regular Expression?](https://stackoverflow.com/questions/4068629/how-to-match-hyphens-with-regular-expression) – bobble bubble Aug 01 '17 at 17:24

1 Answers1

3

You need to use

strsplit(str, '[,+-]')

to split on , + or -. If you need to add \ to split on, use '[,\\+-]' with the default TRE regex engine.

When - is at the end (or at the start, too) of the bracket expression, it is parsed as a literal hyphen. In your case, it is treated as a range operator and '[,-\\]' matches a range of chars between , and \:

enter image description here

Note that you are using a TRE regex flavor here (since no perl=TRUE is specified), thus, the double backslash is treated as a literal backslash in the regex pattern. "[,-\\]" pattern would be invalid if you used a PCRE regex engine to parse the pattern, you would need to define a backslash with 4 backslashes in the string literal.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563