I have this kind of data :
MWE <- c(
"World1 2.6 -4.5 1.5 5.0 -0.2",
"1,2",
"G20 112.9 -4.1 1.6 5.7 0.4"
)
The desired output is :
[1] " 2.6 -4.5 1.5 5.0 -0.2"
[2] ""
[3] " 112.9 -4.1 1.6 5.7 0.4"
I want to separate what is a number and what is not (in this precise case, the "1,2"
is a "mistake" in datamining and refers to footnotes for "G20", just to mention it is not a number I want to get).
I think the correct regex for the format is therefore [-+]?\\d+\\.\\d
And it works in the positive sense :
> MWE2 <- gsub("[-+]?\\d+\\.\\d","blah",MWE)
> MWE2
[1] "World1 blah blah blah blah blah"
[2] "1,2"
[3] "G20 blah blah blah blah blah"
But when I try to isolate values by replacing every thing that is not that by nothing, with negative lookahead (I have understood from there that it was what I was looking for) (?! )
, so that : (?![-+]?\\d+\\.\\d)
, but it does not seem to work (I have looked here and added the perl=T
option)
> MWE3 <- gsub("(?![-+]?\\d+\\.\\d)","",MWE,perl=T)
> MWE3
[1] "World1 2.6 -4.5 1.5 5.0 -0.2"
[2] "1,2"
[3] "G20 112.9 -4.1 1.6 5.7 0.4"