1

How can I test whether a Ruby string contains only a specific set of characters?

For example, if my set of allowed characters is "AGHTM" plus digits 0-9,

  • the string "MT3G22AH" is valid;

  • the string "TAR34" is invalid (because of the R).

user513951
  • 12,445
  • 7
  • 65
  • 82
lu1s
  • 5,600
  • 3
  • 22
  • 37
  • 1
    When asking it's important to show us your effort. Did you search and not find anything? Then show where you searched and explain why it didn't help. Did you write code? If not, why? If you did and can't get it to work then read "[mcve]". Currently it looks like you didn't try and want us to write the code for you which is not what SO is for. http://meta.stackoverflow.com/q/261592/128421 – the Tin Man Jan 19 '17 at 20:20
  • Is "MT3G22AH" the entire string or is it a sub-string? – the Tin Man Jan 19 '17 at 20:22
  • @theTinMan that's the entire string, and was an example of the real set. I don't yet understand well reg-ex and couldn't find how to achieve this particular case through SO or other resources. I was in a hurry to solve a game-like challenge in codefights just for fun before getting busier. Thanks for the SO enlightenment on how to ask questions. Didn't know this was such a shame :( Hope this questions is not duplicate if so let me know. – lu1s Jan 19 '17 at 20:40
  • It's important to understand that Stack Overflow is a reference book, not a discussion list. Every question and its associated answers is a separate article in the reference book, and, to help others, we need questions to have certain things. You ask a question but you're really asking it for the future users searching for a solution too. You know your system and what you tried but they have no idea so the better defined the problem the more it helps others. – the Tin Man Jan 19 '17 at 21:44
  • If I'd find this question in SO I wouldn't asked it. With some downvotes or closing it as too broad it won't still be available in that reference book. I myself voted it as too broad but still can't find a duplicate. – lu1s Jan 19 '17 at 21:56
  • https://stackoverflow.com/a/3878656/128421 covers a number of ways. – the Tin Man Feb 14 '20 at 19:20

4 Answers4

6

A nicely idiomatic non-regex solution is to use String#count:

"MT3G22AH".count("^AGHTM0-9").zero?  # => true
"TAR34".count("^AGHTM0-9").zero?     # => false

The inverse also works, if you find it more readable:

"MT3G22AH".count('AGHTM0-9') == "MT3G22AH".size  # => true

Take your pick.

For longer strings, both methods here perform significantly better than regex-based options.

user513951
  • 12,445
  • 7
  • 65
  • 82
  • I didn't see the point at first, because it is so similar to my method. It does appear to be overall faster though, possibly because it doesn't create a new string. – Eric Duminil Jan 19 '17 at 19:42
  • 1
    For readability : `string.count('AGHTM0-9') == string.size` might be better. It's just as fast as your method. – Eric Duminil Jan 19 '17 at 19:59
4
allowed = "AGHTM"
allowed = /\A[\d#{allowed}]+\z/i

"MT3G22AH" =~ allowed #⇒ truthy
"TAR34" =~ allowed #⇒ falsey
Aleksei Matiushkin
  • 119,336
  • 10
  • 100
  • 160
  • Nice! Thanks. And since I don't like `nil` and `0` returns I'm testing now like this: `!!('MT3G22AH' =~ allowed)` – lu1s Jan 19 '17 at 18:56
  • 2
    Double-negation is a quick way to cast to bool. In Ruby 2.4 you can also do `"x".match?(/y/)` if you want, the [`match?`](https://ruby-doc.org/core-2.4.0/String.html#method-i-match-3F) method returns a boolean. – tadman Jan 19 '17 at 18:58
  • 1
    I was expecting your method to be much faster with long strings of unallowed characters, since it should stop directly after a non-match. The Regex solution seems to be always at least 5x slower than delete/count. – Eric Duminil Jan 19 '17 at 19:17
3

String#delete

One possibility is to delete all the allowed characters and check if the resulting string is empty :

"MT3G22AH".delete("AGHTM0-9").empty?
#=> true
"TAR34".delete("AGHTM0-9").empty?
#=> false

Performance

Short strings

For short strings, @steenslag is the fastest method, followed by @Jesse and my method.

def mudasobwa(string)
  allowed = 'AGHTM'
  allowed = /\A[\d#{allowed}]+\z/i
  string.match? allowed
end

def eric(string)
  string.delete('AGHTM1-9').empty?
end

def meagar(string)
  allowed = 'AGHTM0123456789'
  string.chars.uniq.all? { |c| allowed.include?(c) }
end

def jesse(string)
  string.count('^AGHTM0-9').zero?
end

def steenslag(string)
  !string.match?(/[^AGHTM0-9]/) 
end

require 'fruity'

n = 1
str1 = 'MT3G22AH' * n
str2 = 'TAR34' * n
compare do
  _jesse { [jesse(str1), jesse(str2)] }
  _eric { [eric(str1), eric(str2)] }
  _mudasobwa { [mudasobwa(str1), mudasobwa(str2)] }
  _meagar { [meagar(str1), meagar(str2)] }
  _steenslag { [steenslag(str1), steenslag(str2)] }
end

It outputs :

Running each test 1024 times. Test will take about 2 seconds.
_steenslag is faster than _jesse by 2.2x ± 0.1
_jesse is faster than _eric by 8.000000000000007% ± 1.0%
_eric is faster than _meagar by 4.3x ± 0.1
_meagar is faster than _mudasobwa by 2.4x ± 0.1

Longer strings

For longer strings ( n=5000), @Jesse becomes the fastest method.

Running each test 32 times. Test will take about 12 seconds.
_jesse is faster than _eric by 2.5x ± 0.01
_eric is faster than _mudasobwa by 4x ± 1.0
_mudasobwa is faster than _steenslag by 2x ± 0.1
_steenslag is faster than _meagar by 11x ± 0.1
Eric Duminil
  • 52,989
  • 9
  • 71
  • 124
1

This seems to be faster than all previous benchmarks (by @Eric Duminil)(ruby 2.4):

!string.match?(/[^AGHTM0-9]/) 
steenslag
  • 79,051
  • 16
  • 138
  • 171