3

I have a character vector

words <- c("somethingspan.", "..span?", "spanthank", "great to hear", "yourspan")

And I'm trying to remove span AND punctuation from every word in the vector

> something thank great to hear your

The thing is, there's no rule if span will appear before or after the word I'm interested in. Also, span can be glued to: i) characters only (e.g. yourspan), punctuation only (e.g. ..span?) or character and punctuation (e.g. somethingspan.).

I searched SO for the answer, but usually I see request to remove whole words (like here ) or elements of the string after/before a letter/punctuation (like here )

Any help will be appreciated

Kasia Kulma
  • 1,683
  • 1
  • 14
  • 39

3 Answers3

2

https://regex101.com/ here you can try everything.

clean_words<- gsub(pattern = "span",replacement = "",words, perl = T)
# if you want the sentence
sentence<-paste(clean_words, sep = " ", collapse = " ")

# to remove punctuation this regex only takes from A to z
clean_sentence<- gsub(pattern = "[^a-zA-Z ]",replacement = "",sentence, perl = T)
JavRoNu
  • 349
  • 2
  • 12
2

You may use

[[:punct:]]*span[[:punct:]]*

See the regex demo.

Details

  • [[:punct:]]* - 0+ punctuations chars
  • span - a literal substring
  • [[:punct:]]* - 0+ punctuations chars

R Demo:

words <- c("somethingspan.", "..span?", "spanthank", "great to hear", "yourspan")
words <- gsub("[[:punct:]]*span[[:punct:]]*", "", words) # Remove spans
words <- words[words != ""] # Discard empty elements
paste(words, collapse=" ")  # Concat the elements
## => [1] "something thank great to hear your"

If there result whitespace only elements after removing unwanted strings, you may replace the second step with words <- words[trimws(words) != ""] (instead of words[words != ""]).

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
0

Use sub to remove span. To make it into a sentence use paste and collapse

library(magrittr)

sub("^[[:punct:]]{,2}span|span[[:punct:]]{,2}$", "", words)  %>% paste(collapse=" ")

so it only removes a span in the beginning or in the end.

Output

[1] "something ? thank great to hear your"
Andre Elrico
  • 10,956
  • 6
  • 50
  • 69
  • 1
    `"^span|span$"` will not handle `"somethingspan."`, there is a `.` at the end. See OP: *it can be followed by characters, punctuation, combination of the two, etc.*. So, even `[[:punct:]]?` before `$` won't help. The question is unclear. – Wiktor Stribiżew Dec 14 '17 at 10:16
  • Andre, the question is too unclear, but have a look at *it can be followed by characters, punctuation, combination of the two, etc.* Just `[[:punct:]]?` won't help. – Wiktor Stribiżew Dec 14 '17 at 10:21
  • @Wiktor, what's unclear about the question? I'll clarify it – Kasia Kulma Dec 14 '17 at 10:23
  • Yes its unclear. I guess the code provided by all of us. Should lead @Kasia to her goal. – Andre Elrico Dec 14 '17 at 10:24
  • @Kasia, please have ALL possibilities that can occur in your rep. code. – Andre Elrico Dec 14 '17 at 10:25
  • @A5C1D2H2I1M1N2O1R2T1 I provided a dummy example of my data and a desirable output. This should answer all your questions above – Kasia Kulma Dec 14 '17 at 10:34