I think this should work:
re <- regexpr(
"(?(?=.*?(\\d+\\.\\d+\\.\\d+\\.\\d+).*?)(\\1|))",
z$x, perl = TRUE)
regmatches(z$x, re)
#[1] "112.68.196.98" "192.41.196.888" "" ""
This uses a regex conditional, keeping the capture group (\\1
) in the case of a positive match on .*?(\\d+\\.\\d+\\.\\d+\\.\\d+).*?
, else returning an empty result.
Update:
Regarding your comment, I think the following changes will allow you to capture multiple IP addresses in a single string. First, switch from regexpr
to gregexpr
to allow multiple results:
re2 <- gregexpr(
"(?(?=.*?(\\d+\\.\\d+\\.\\d+\\.\\d+).*?)(\\1|))",
z2$x, perl = TRUE
)
Since calling regmatches
on a gregexpr
input will return a list, some additional processing is required:
res2 <- sapply(regmatches(z2$x, re2), function(x) {
gsub(
"^\\s+|\\s+$", "",
gsub("\\s+", " ", paste0(x, collapse = " "))
)
}
This should be suitable for, e.g., recombining with your data.frame
as a new column:
res2
#[1] "112.68.196.98 192.41.196.888" "192.41.196.888"
# "" "112.68.196.98"
And if you did want to break out each result into its own string, the expression is a little simpler (compared to sapply(...)
):
lapply(regmatches(z2$x, re2), function(x) {
Filter(function(y) y != "", x)
})
#[[1]]
#[1] "112.68.196.98" "192.41.196.888"
#[[2]]
#[1] "192.41.196.888"
#[[3]]
#character(0)
#[[4]]
#[1] "112.68.196.98"
Data:
z2 <- data.frame(
x = c('112.68.196.98 5.32 192.41.196.888',
'192.41.196.888',
'..', '5.32 88 112.68.196.98'),
stringsAsFactors = FALSE
)