First we extract the first 3 words or 2 words that are followed by :
, using stringr::str_extract
or you could just use sub
to match the full value
and only capture the given expression i.e sub('^(expre).+$', '\\1', value)
, the regex pattern is as follows \w+ \w+(:| \w+)
i.e match two words \w+ \w+
then either match :
or another word.
library(stringr)
df %>%
mutate(beginnings= str_extract(value, "\\w+ \\w+(:| \\w+)")) %>%
group_by(beginnings)
# A tibble: 7 x 3
# Groups: beginnings [3]
ID value beginnings
<int> <fct> <chr>
1 1 request body: <?xml version=2.0> values received request body:
2 2 request body: <code> jnwg3425 request body:
3 3 request body: <?xml version=2.0, <PlatCode>, <code> qwefn2 request body:
4 4 Error in message received Error in message
5 5 Error in message received Error in message
6 6 Push forward message x3535 Push forward message
7 7 Push forward message <MarkCheckMSG> Push forward message
Using a different regular expression
(\w+ )+[a-z]{2,}:?
=> match as much words followed by space as possible ((\w+ )+
) followed by more then two letters [a-z]{2,}
and :
if it exists.
df %>%
mutate(beginings= str_extract(value, "(\\w+ )+[a-z]{2,}:?")) %>%
group_by(beginings)
# A tibble: 7 x 3
# Groups: beginings [3]
ID value beginings
<int> <fct> <chr>
1 1 request body: <?xml version=2.0> values received request body:
2 2 request body: <code> jnwg3425 request body:
3 3 request body: <?xml version=2.0, <PlatCode>, <code> qwefn2 request body:
4 4 Error in message received Error in message received
5 5 Error in message received Error in message received
6 6 Push forward message x3535 Push forward message
7 7 Push forward message <MarkCheckMSG> Push forward message