-1

I have this word, "sam buy expensive toys as 125898652". I would like to extract the word after "as", which is "125898652".

I'm using

(?<=as\s)+[^\s]+

I've tried it on https://regex101.com/r/NaWAl1/1 and it works pretty well. when i execute it on R it returning error as

Error: '\s' is an unrecognized escape in character string starting ""(?<='as'\s"

So I modify it to

(?<='CR'\s)+[^\s]+

It returning different error as :

Error in stri_extract_first_regex(string, pattern, opts_regex = opts(pattern)) : 
  Syntax error in regexp pattern. (U_REGEX_RULE_SYNTAX)

Can someone please explain it to me why regex different in R and how to make it works. Thank you so much

ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81
yuliansen
  • 470
  • 2
  • 14
  • 29
  • `stringi::stri_extract_first_regex("sam buy expensive toys as 125898652","(?<=as\\s)[^\\s]+")` works well for your case. Do not quantify lookarounds, they are zero-width assertions. And use double backslashes in string literals to define literal backslashes. – Wiktor Stribiżew Dec 20 '19 at 10:58
  • i've used double backslash too for each of the double blackslash there but it still doesnt work – yuliansen Dec 23 '19 at 01:32
  • [`(?<=as\s)+[^\s]+` works well](https://rextester.com/LJF82742) – Wiktor Stribiżew Dec 23 '19 at 10:40

2 Answers2

1

Using sub

sub(".*as\\s(\\w+).*", "\\1", "sam buy expensive toys as 125898652")
#[1] "125898652"

Or lookbehind regex

stringr::str_extract("sam buy expensive toys as 125898652", "(?<=as\\s)\\w+")
#[1] "125898652"

For words which has , in it and may have decimal places we can do

x <- "sam buy expensive toys as 128984,45697.00"
sub(".*as\\s(\\d+\\.?\\d+).*", "\\1",gsub(',', '', x))
#[1] "12898445697.00"
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
1

With base R, given string s <- "sam buy expensive toys as 125898652", you can use gsub() or strsplit():

> gsub(".*?as\\s","",s)
[1] "125898652

or

> unlist(strsplit(s,split = "(?<=as\\s)",perl = T))[2]
[1] "125898652"
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81