2

I know how to do it in Python, but can't get it to work in R

> string  <- "this is a sentence"
> pattern <- "\b([\w]+)[\s]+([\w]+)[\W]*?$"
Error: '\w' is an unrecognized escape in character string starting "\b([\w"
> match   <- regexec(pattern, string)
> words   <- regmatches(string, match)
> words
[[1]]
character(0)
  • If you check out [this feature list](http://www.regular-expressions.info/refflavors.html), by default, R doesn't do `\w`. It looks like if you set `perl=true` it should work? I'll be honest, I don't know anything about R, so I don't know what that entails. Hopefully it's a simple fix. – Michelle Aug 21 '13 at 17:16
  • 2
    The pattern should be: `"\\b(\\w+)\\s+\\w+\\W*?$"` and then take the second component of the output. – G. Grothendieck Aug 21 '13 at 17:33

3 Answers3

6
sub('.*?(\\w+)\\W+\\w+\\W*?$', '\\1', string)
#[1] "a"

which reads - be non-greedy and look for anything until you get to the sequence - some word characters + some non-word characters + some word characters + optional non-word characters + end of string, then extract the first collection of word characters in that sequence

eddi
  • 49,088
  • 6
  • 104
  • 155
  • 2
    You probably need to trim trailing spaces or optionally allow detection in the pattern. It's especially embarrassing that a period at the end of the "sentence" will sabotage this pattern. Perhaps: `".*?(\\w+)\\W+\\w+(\\W?)$"` – IRTFM Aug 21 '13 at 18:01
5

Non-regex solution:

string  <- "this is a sentence"
split <- strsplit(string, " ")[[1]]
split[length(split)-1]
mengeln
  • 331
  • 1
  • 3
  • I initially solved it by using strsplit, but I also wanted to figure the regexpr approach. –  Aug 21 '13 at 17:45
0

Python non regex version

    spl = t.split(" ")
    if len(spl) > 0:
        s = spl[len(spl)-2]
QuentinJS
  • 162
  • 1
  • 9