2

I have to extract parts of a string in R based on a symbol and a word. I have a name such as

s <-"++can+you+please-help +me" 

and the output would be:

"+ can" "+you" "+please" "-help" "+me"

where all words with the corresponding symbol before are shown. I've tried to use the strsplit and sub functions but I´m struggling in getting the output that I want. Can you please help me? Thanks!

Sotos
  • 51,121
  • 6
  • 32
  • 66
ACLAN
  • 401
  • 3
  • 9
  • 1
    Please share what you tried in order not to repeat the same. – Wiktor Stribiżew Aug 17 '17 at 13:57
  • https://stackoverflow.com/questions/15573887/split-string-with-regex – Olivia Aug 17 '17 at 14:07
  • 2
    Why is there a space with `"+ can"`. Did you intend to remove one of the plus sign and replace it with that space? `unlist(strsplit(s, split="(?<=\\w)\\s*(?=[+-]+)", perl=T))` gets very close. – Abdou Aug 17 '17 at 14:09
  • The space between "+" and "can" was a typing error. my apologies for that. My intention was only to get one "+" out of the "++" and get "+can" and not "+ can". – ACLAN Aug 18 '17 at 10:46

2 Answers2

1

Do

library(stringi)
result = unlist(stri_match_all(regex = "\\W\\w+",str = s))

Result

> result
[1] "+can"    "+you"    "+please" "-help"   "+me" 

No symbols

If you only want the words (no symbols), do:

result = unlist(stri_match_all(regex = "\\w+",str = s))

result
[1] "can"    "you"    "please" "help"   "me" 
R. Schifini
  • 9,085
  • 2
  • 26
  • 32
1

Here is one option using base R

regmatches(s, gregexpr("[[:punct:]]\\w+", s))[[1]]  
#[1] "+can"    "+you"    "+please" "-help"   "+me"    
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Thank you! it works. Is there any website where I can have a nice explanation and good examples on how to write the patterns that I´m looking for in R? – ACLAN Aug 18 '17 at 10:49