0

I want to do gsub with lots of variations in spaces.

My text is

Yes, please periodically send me e-mail updates

I want to remove this sentence. But I have lots of variations of this in my corpus. For example, I have

Yes, please periodically send me e-mail  updates
Yes, please periodically send me  e-mail updates
Yes, please periodically  send me e-mail updates

How can I remove these sentences with regular expression? I tried to specify every case like a following code.

gsub("Yes, please periodically send me  e-mail updates", "", text)        
gsub("Yes, please periodically send me e-mail  updates", "", text)        
gsub("Yes, please periodically  send me e-mail updates", "", text)        

I believe there is a better way to remove these sentences with one code. Thank you for any help!

user3077008
  • 837
  • 4
  • 13
  • 24

3 Answers3

3

Use [[:space:]]+ to match one or more spaces.

gsub("Yes, please periodically[[:space:]]+send[[:space:]]+me[[:space:]]+e-mail[[:space:]]+updates", "", text)
Avinash Raj
  • 172,303
  • 28
  • 230
  • 274
  • 2
    I was going to suggest something similar, using `gsub("\\s", "\\\\s+", pattern)` as the "pattern", where `pattern <- "Yes, please periodically send me e-mail updates"`. Should save typing `[[:space:]]+` repeatedly. – A5C1D2H2I1M1N2O1R2T1 Mar 01 '15 at 09:35
2

Maybe I am misunderstanding the question, but would it not be simpler to replace "Yes, ... updates" to cover all possible extra spaces?

text <- c("Yes, please periodically send me e-mail  updates",
          "Yes, please periodically send me  e-mail updates", 
          "Yes, please periodically  send me e-mail updates")
gsub("^Yes,.*updates", "", text)
[1] "" "" ""
lawyeR
  • 7,488
  • 5
  • 33
  • 63
  • Not if other sentences with that beginning and end exist! example: "Yes, I would like you to only delete sentences requesting periodic e-mail updates" – ping Mar 01 '15 at 11:31
  • @ping, good point. I suppose I could add some look aheads or look behinds, but if these are free-text answers, one could never cover all possibilities. But the OP's three strings were quite structured. – lawyeR Mar 01 '15 at 11:40
1
library(magrittr)
text_to_remove <- "Yes, please periodically send me e-mail updates"
text %>% gsub([[:space:]]+, " ") %>% gsub(text_to_remove , "")

A bit of a "silly" approach. Assign the string you want to remove without double (triple and so on) spaces. Replace multiple spaces of initial text with single space, then replace text you want to remove with "".

dimitris_ps
  • 5,849
  • 3
  • 29
  • 55