1

I'm learning how to use regular expressions in R, but struggling a bit. I understood from R documentation and other sources that . represent any character and that * means that the previously staten pattern may occur any number of time.

So if I try something like :

gsub(".* ","test_","abcd efgh")

The part of the string prior to the space is replaced and I get "test_efgh", as expected.

I then tried to use [:space:] instead of " ". From R documentation :

[:space:]

Space characters: tab, newline, vertical tab, form feed, carriage return, space and possibly other locale-dependent characters

But I got quite different outputs, as shown below :

gsub(".*[:space:]","test_","abcd efgh")
#"test_fgh" : "e" is missing
gsub(".*[:space:]","test_","bcd df")
#"test_d df" : the first "d" was not replaced
gsub(".*[:space:]","test_","bcd bcde")
#"test_" : everything was cleared after the space
gsub(".*[:space:]","test_","bb df")
#"bb_df" : nothing was replaced

All these examples work fine if I use " " instead of [:space:], so I guess I'm missing something about the latter. I don't understand why it doesn't apply in these cases, and I don't get why the outputs are so different (in cases 1 and 3, letters I wanted to keep were cleared whereas in cases 2 and 4, letters I wanted cleared were not).

How should I use [:space:], and why do I get different outputs from seemingly identical uses ?

Community
  • 1
  • 1
Vincent
  • 955
  • 2
  • 15
  • 32
  • 2
    The `[:space:]` bracket expression matches a single char: `:`, `s`, `p`, `a`, `c` or `e`. Use the POSIX character class inside a bracket expression: `[[:space:]]`. – Wiktor Stribiżew Sep 06 '17 at 09:49

0 Answers0