2

I'd like to match all punctuation except single quotes.

I've tried the following.

  • /[^'[:punct:]] negates all punctuation.
  • [(^')[:punct:]] seems to completely ignore ^'.

If there isn't, I guess I can always just write out the full :punct: except for the '.

sawa
  • 165,429
  • 45
  • 277
  • 381
  • As mentioned in answers below, you probably want to use [Negative Lookahead](https://stackoverflow.com/questions/27691225/understanding-negative-lookahead) – B. Morris Dec 03 '18 at 05:27
  • Yes, this question is a duplicate, but, as is oft the case, closing it has the unfortunate effect of burying a good answer that had not been given to duped question, namely, the one given by @revo, revo, I encourage you to post your answer as an answer to the earlier question. – Cary Swoveland Dec 03 '18 at 18:16
  • @CarySwoveland Could you add a comment on why Revo's answer is better? (If there is a reason besides being supported in older versions) Sorry for the duplicate. I searched both ruby docs and SO but somehow missed both. – not_user9123 Dec 03 '18 at 18:33
  • I did not say @revo's answer was better. I'm just saying that it deserves to be seen. It may be more efficient than Amadan's (which could be important in some applications) and readers may find the technique of using a negative lookahead--regardless of whether it is followed by a character class--useful in other applications. Don't apologize for not not having found the duplicate. I've seen many duplicates that contain innovative solutions to problems that were not suggested in the duped questions. We are richer for those answers. – Cary Swoveland Dec 03 '18 at 20:23
  • @CarySwoveland: Hey, _I_ haven't seen the duplicate... and _I wrote it_! (Completely and utterly forgot it existed, even now I can't remember writing it.) I agree revo's answer deserves to be seen, if nothing else then for those that are stuck maintaining prehistoric code, as revo points out on my answer. (But "may be more efficient" is not true according to my benchmarks, the class intersection method is a tiny bit faster.) – Amadan Dec 04 '18 at 05:25
  • You wrote that a long time ago. You were just a kid back then and no doubt had other things on your mind, so I'm not surprised you forgot about it. – Cary Swoveland Dec 04 '18 at 05:32
  • Thank you @CarySwoveland I'll consider it. – revo Dec 04 '18 at 06:31
  • @Amadan I'd like to know your benchmark result for `[[:punct:]](?<!')` as well if it is possible. – revo Dec 04 '18 at 06:32
  • @revo Edited into my answer. – Amadan Dec 04 '18 at 07:00
  • 1
    @Amadan Lookahead approach being a bit slower was predictable but lookbehind being slower than both is not and logically shouldn't. Interesting results in Ruby. Thank you. – revo Dec 04 '18 at 07:41

2 Answers2

6

This would be possible using a negative lookahead:

(?!')[[:punct:]]
revo
  • 47,783
  • 14
  • 74
  • 117
6

From Ruby docs:

A character class may contain another character class. By itself this isn't useful because [a-z[0-9]] describes the same set as [a-z0-9]. However, character classes also support the && operator which performs set intersection on its arguments.

So, "punctuation but not apostrophe" is:

[[:punct:]&&[^']]

EDIT: By demand from revo in question comments, on my machine this benchmarks lookahead as ~10% slower, and lookbehind as ~20% slower:

require 'benchmark'

N = 1_000_000
STR = "Mr. O'Brien! Please don't go, Mr. O'Brien!"

def test(bm, re)
  N.times {
    STR.scan(re).size
  }
end

Benchmark.bm do |bm|
  bm.report("intersection") { test(bm, /[[:punct:]&&[^']]/) }
  bm.report("lookahead") { test(bm, /(?!')[[:punct:]]/) }
  bm.report("lookbehind") { test(bm, /[[:punct:]](?<!')/) }
end
Amadan
  • 191,408
  • 23
  • 240
  • 301
  • It may worth mentioning that older versions of Ruby don't support character class operations. – revo Dec 03 '18 at 05:54
  • 1
    @revo: The latest version that doesn't support class intersections has been pronounced dead at 2014-07-31, and may it stay buried. :) If someone works with EOL'd versions, it's on them. Or I'd need to make a disclaimer on every single JS question I answer :P – Amadan Dec 03 '18 at 06:07
  • You're right but it doesn't need a disclaimer. Having `v1.9+` right after *Ruby* works well. – revo Dec 03 '18 at 06:10
  • 3
    @revo: Not quite - the version I was referring to was 1.9.2. So this is Ruby 1.9.3+. But I'll be damned if I go research version history for every answer I do. And if someone tells me a _currently supported_ version can't handle it, I'd definitely edit it in, as it's relevant information. EOL'd versions? Not worth it. – Amadan Dec 03 '18 at 06:18
  • You don't need to dig deep in docs for that info (I didn't expect either). Others will give a *FYI* comment on demand and that does neither mean your answer is wrong nor you didn't spare no effort on providing it. – revo Dec 03 '18 at 06:51