Problem with negative lookahead in R regex

Question

I have this kind of data :

MWE <- c(
  "World1    2.6       -4.5         1.5          5.0       -0.2",
  "1,2",
  "G20    112.9            -4.1                1.6                        5.7                    0.4"
)

The desired output is :

[1] "    2.6       -4.5         1.5          5.0       -0.2"                                                      
[2] ""                                                                                                               
[3] "   112.9                         -4.1                    1.6                        5.7                    0.4"

I want to separate what is a number and what is not (in this precise case, the "1,2" is a "mistake" in datamining and refers to footnotes for "G20", just to mention it is not a number I want to get).

I think the correct regex for the format is therefore [-+]?\\d+\\.\\d

And it works in the positive sense :

> MWE2 <- gsub("[-+]?\\d+\\.\\d","blah",MWE)  
> MWE2
[1] "World1    blah       blah         blah          blah       blah"                                                     
[2] "1,2"                                                                                                                 
[3] "G20    blah                         blah                    blah                        blah                    blah"

But when I try to isolate values by replacing every thing that is not that by nothing, with negative lookahead (I have understood from there that it was what I was looking for) (?! ), so that : (?![-+]?\\d+\\.\\d), but it does not seem to work (I have looked here and added the perl=T option)

> MWE3 <- gsub("(?![-+]?\\d+\\.\\d)","",MWE,perl=T)  
> MWE3
[1] "World1    2.6       -4.5         1.5          5.0       -0.2"                                                      
[2] "1,2"                                                                                                               
[3] "G20    112.9                         -4.1                    1.6                        5.7                    0.4"

There is no problem with the negative lookahead, it is working as expected. What is the expected output? — Wiktor Stribiżew, Oct 15 '20 at 08:59
I added it, but will also look as it seems I have not understood sthg — Anthony Martin, Oct 15 '20 at 09:01
Do you really want to keep the spacing? Try `sapply(regmatches(MWE, gregexpr("-?\\b\\d+\\.\\d\\b", MWE)), function(x) paste(x, collapse=" "))` — Wiktor Stribiżew, Oct 15 '20 at 09:07
`^(?!www\.petroules\.com$).*$` matches any *full* string that is not equal to `www.petroules.com` - this is not your scenario. — Wiktor Stribiżew, Oct 15 '20 at 09:09
I do not want to keep the spacing necessarily, I just need at least one space to be able to separate them later as a databaes — Anthony Martin, Oct 15 '20 at 09:18
Then why not just extrac them? `regmatches(MWE, gregexpr("-?\\b\\d+\\.\\d\\b", MWE))` or `stringr::str_extract_all(MWE, "-?\\b\\d+\\.\\d\\b")`, see https://ideone.com/XeEvXi — Wiktor Stribiżew, Oct 15 '20 at 09:20
Because I did not know it existed and used a more laborious approach :) — Anthony Martin, Oct 15 '20 at 09:33

Problem with negative lookahead in R regex

0 Answers0