Extract 2nd to last word in string

Question

I know how to do it in Python, but can't get it to work in R

> string  <- "this is a sentence"
> pattern <- "\b([\w]+)[\s]+([\w]+)[\W]*?$"
Error: '\w' is an unrecognized escape in character string starting "\b([\w"
> match   <- regexec(pattern, string)
> words   <- regmatches(string, match)
> words
[[1]]
character(0)

If you check out [this feature list](http://www.regular-expressions.info/refflavors.html), by default, R doesn't do `\w`. It looks like if you set `perl=true` it should work? I'll be honest, I don't know anything about R, so I don't know what that entails. Hopefully it's a simple fix. — Michelle, Aug 21 '13 at 17:16
The pattern should be: `"\\b(\\w+)\\s+\\w+\\W*?$"` and then take the second component of the output. — G. Grothendieck, Aug 21 '13 at 17:33

eddi · Accepted Answer · 2013-08-21T18:21:04.067

6

sub('.*?(\\w+)\\W+\\w+\\W*?$', '\\1', string)
#[1] "a"

which reads - be non-greedy and look for anything until you get to the sequence - some word characters + some non-word characters + some word characters + optional non-word characters + end of string, then extract the first collection of word characters in that sequence

edited Aug 21 '13 at 18:21

answered Aug 21 '13 at 17:17

eddi

49,088
6
104
155

2

You probably need to trim trailing spaces or optionally allow detection in the pattern. It's especially embarrassing that a period at the end of the "sentence" will sabotage this pattern. Perhaps: `".*?(\\w+)\\W+\\w+(\\W?)$"` – IRTFM Aug 21 '13 at 18:01

score 5 · Answer 2 · answered Aug 21 '13 at 17:43

5

Non-regex solution:

string  <- "this is a sentence"
split <- strsplit(string, " ")[[1]]
split[length(split)-1]

answered Aug 21 '13 at 17:43

mengeln

331
1
3

I initially solved it by using strsplit, but I also wanted to figure the regexpr approach. – Aug 21 '13 at 17:45

score 0 · Answer 3 · answered Jan 29 '22 at 23:21

0

Python non regex version

    spl = t.split(" ")
    if len(spl) > 0:
        s = spl[len(spl)-2]

answered Jan 29 '22 at 23:21

QuentinJS

162
1
9

Extract 2nd to last word in string

3 Answers3

Linked