I'm learning how to use regular expressions in R, but struggling a bit.
I understood from R documentation and other sources that .
represent any character and that *
means that the previously staten pattern may occur any number of time.
So if I try something like :
gsub(".* ","test_","abcd efgh")
The part of the string prior to the space is replaced and I get "test_efgh", as expected.
I then tried to use [:space:]
instead of " ". From R documentation :
[:space:]
Space characters: tab, newline, vertical tab, form feed, carriage return, space and possibly other locale-dependent characters
But I got quite different outputs, as shown below :
gsub(".*[:space:]","test_","abcd efgh")
#"test_fgh" : "e" is missing
gsub(".*[:space:]","test_","bcd df")
#"test_d df" : the first "d" was not replaced
gsub(".*[:space:]","test_","bcd bcde")
#"test_" : everything was cleared after the space
gsub(".*[:space:]","test_","bb df")
#"bb_df" : nothing was replaced
All these examples work fine if I use " " instead of [:space:]
, so I guess I'm missing something about the latter. I don't understand why it doesn't apply in these cases, and I don't get why the outputs are so different (in cases 1 and 3, letters I wanted to keep were cleared whereas in cases 2 and 4, letters I wanted cleared were not).
How should I use [:space:]
, and why do I get different outputs from seemingly identical uses ?