3

I have a vector vec which has elements with a punctuation mark in it. I want to return all elements with punctuation mark except the one with asterisk.

vec <- c("a,","abc","ef","abc-","abc|","abc*01")
> vec[grepl("[^*][[:punct:]]", vec)]
[1] "a,"     "abc-"   "abc|"   "abc*01"

why does it return "abc*01" if there is a negation mark[^*] for it?

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563

2 Answers2

3

Maybe you can try grep like below

grep("\\*",grep("[[:punct:]]",vec,value = TRUE), value = TRUE,invert = TRUE) # nested `grep`s for double filtering

or

grep("[^\\*[:^punct:]]",vec,perl = TRUE, value = TRUE) # but this will fail for case `abc*01|` (thanks for feedback from @Tim Biegeleisen)

which gives

[1] "a,"   "abc-" "abc|"
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81
  • This logic _fails_ and passes the value `abc*01|`. Your regex is saying to assert a non `*` punctuation character, but it does not eliminate strings having both `*` and non `*` punctuation. – Tim Biegeleisen May 11 '20 at 07:43
  • @TimBiegeleisen Thank you again, now I see it. Do you have any tips on slightly modifying this regex? Would be greatly appreciated! – ThomasIsCoding May 11 '20 at 08:05
  • @TimBiegeleisen I updated my answer with a "stupid" nested `grep`. Your further feedback is welcome! :) – ThomasIsCoding May 11 '20 at 08:40
2

You could use grepl here:

vec <- c("a,","abc-","abc|","abc*01")
vec[grepl("^(?!.*\\*).*[[:punct:]].*$", vec, perl=TRUE)]

[1] "a,"   "abc-" "abc|"

The regex pattern used ^(?!.*\\*).*[[:punct:]].*$ will only match contents which does not contain any asterisk characters, while also containing at least one punctuation character:

^                from the start of the string
    (?!.*\*)     assert that no * occurs anywhere in the string
    .*           match any content
    [[:punct:]]  match any single punctuation character (but not *)
    .*           match any content
$                end of the string
Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • Sorry for incomplete information but my vector also contains elements with no punctuation marks. I dont want them in my output. **I want only elements with punctuation except asterisk** – Dhwani Dholakia May 11 '20 at 07:09
  • @DhwaniDholakia Then check the updated answer. So if I understand correctly, the rule is, it must have punctuation, but it also must _not_ have `*`. Is that right? – Tim Biegeleisen May 11 '20 at 07:13
  • @ Tim Biegeleisen. Thanks a lot. I got my answer. I would be great if you can give information on the pattern you created. – Dhwani Dholakia May 11 '20 at 07:16