I am trying to get the full RegEx match out from R, but I can only seem to get the first portion of the string.
Using http://regexpal.com/ I can confirm that my RegEx is good and that it matches what I expect. In my data, the "error type" is found between the number preceded by an asterisk and the next comma. So I'm looking to return "*20508436572 access forbidden by rule"
in the first instance and "*20508436572 some_error"
in the second.
Example:
library(stringr)
regex.errortype<-'\\*\\d+\\s[^,\\n]+'
test_string1<-'2014/08/07 08:28:56 [error] 21278#0: *20508436572 access forbidden by rule, client: 111.222.111.222'
test_string2<-'2014/08/07 08:28:56 [error] 21278#0: *20508436572 some_error, client: 111.222.111.222'
str_extract(test_string1, regex.errortype)
str_extract_all(test_string1, regex.errortype)
regmatches(test_string, regexpr(regex.errortype, test_string1))
str_extract(test_string2, regex.errortype)
str_extract_all(test_string2, regex.errortype)
regmatches(test_string2, regexpr(regex.errortype, test_string2))
Results:
> str_extract(test_string1, regex.errortype)
[1] "*20508436572 access forbidde"
> str_extract_all(test_string1, regex.errortype)
[[1]]
[1] "*20508436572 access forbidde"
> regmatches(test_string1, regexpr(regex.errortype, test_string1))
[1] "*20508436572 access forbidde"
> str_extract(test_string2, regex.errortype)
[1] "*20508436572 some_error"
> str_extract_all(test_string2, regex.errortype)
[[1]]
[1] "*20508436572 some_error"
> regmatches(test_string2, regexpr(regex.errortype, test_string2))
[1] "*20508436572 some_error"
As you can see, the longer match is truncated, but the shorter one is correctly parsed.
Am I missing something here, or is there some other method to get the full match back?
Cheers,
Andy.