Validate that string contains only allowed characters in Ruby

Question

How can I test whether a Ruby string contains only a specific set of characters?

For example, if my set of allowed characters is "AGHTM" plus digits 0-9,

the string "MT3G22AH" is valid;
the string "TAR34" is invalid (because of the R).

When asking it's important to show us your effort. Did you search and not find anything? Then show where you searched and explain why it didn't help. Did you write code? If not, why? If you did and can't get it to work then read "[mcve]". Currently it looks like you didn't try and want us to write the code for you which is not what SO is for. http://meta.stackoverflow.com/q/261592/128421 — the Tin Man, Jan 19 '17 at 20:20
@theTinMan that's the entire string, and was an example of the real set. I don't yet understand well reg-ex and couldn't find how to achieve this particular case through SO or other resources. I was in a hurry to solve a game-like challenge in codefights just for fun before getting busier. Thanks for the SO enlightenment on how to ask questions. Didn't know this was such a shame :( Hope this questions is not duplicate if so let me know. — lu1s, Jan 19 '17 at 20:40
It's important to understand that Stack Overflow is a reference book, not a discussion list. Every question and its associated answers is a separate article in the reference book, and, to help others, we need questions to have certain things. You ask a question but you're really asking it for the future users searching for a solution too. You know your system and what you tried but they have no idea so the better defined the problem the more it helps others. — the Tin Man, Jan 19 '17 at 21:44
If I'd find this question in SO I wouldn't asked it. With some downvotes or closing it as too broad it won't still be available in that reference book. I myself voted it as too broad but still can't find a duplicate. — lu1s, Jan 19 '17 at 21:56
https://stackoverflow.com/a/3878656/128421 covers a number of ways. — the Tin Man, Feb 14 '20 at 19:20

user513951 · Answer 1 · 2020-05-18T19:17:43.697

6

A nicely idiomatic non-regex solution is to use String#count:

"MT3G22AH".count("^AGHTM0-9").zero?  # => true
"TAR34".count("^AGHTM0-9").zero?     # => false

The inverse also works, if you find it more readable:

"MT3G22AH".count('AGHTM0-9') == "MT3G22AH".size  # => true

Take your pick.

For longer strings, both methods here perform significantly better than regex-based options.

edited May 18 '20 at 19:17

answered Jan 19 '17 at 19:00

user513951

12,445
7
65
82

I didn't see the point at first, because it is so similar to my method. It does appear to be overall faster though, possibly because it doesn't create a new string. – Eric Duminil Jan 19 '17 at 19:42
1

For readability : `string.count('AGHTM0-9') == string.size` might be better. It's just as fast as your method. – Eric Duminil Jan 19 '17 at 19:59

score 4 · Answer 2 · answered Jan 19 '17 at 18:48

4

allowed = "AGHTM"
allowed = /\A[\d#{allowed}]+\z/i

"MT3G22AH" =~ allowed #⇒ truthy
"TAR34" =~ allowed #⇒ falsey

answered Jan 19 '17 at 18:48

Aleksei Matiushkin

119,336
10
100
160

Nice! Thanks. And since I don't like `nil` and `0` returns I'm testing now like this: `!!('MT3G22AH' =~ allowed)` – lu1s Jan 19 '17 at 18:56
2

Double-negation is a quick way to cast to bool. In Ruby 2.4 you can also do `"x".match?(/y/)` if you want, the [`match?`](https://ruby-doc.org/core-2.4.0/String.html#method-i-match-3F) method returns a boolean. – tadman Jan 19 '17 at 18:58
1

I was expecting your method to be much faster with long strings of unallowed characters, since it should stop directly after a non-match. The Regex solution seems to be always at least 5x slower than delete/count. – Eric Duminil Jan 19 '17 at 19:17

Eric Duminil · Answer 3 · 2017-01-19T19:41:52.443

String#delete

One possibility is to delete all the allowed characters and check if the resulting string is empty :

"MT3G22AH".delete("AGHTM0-9").empty?
#=> true
"TAR34".delete("AGHTM0-9").empty?
#=> false

Performance

Short strings

For short strings, @steenslag is the fastest method, followed by @Jesse and my method.

def mudasobwa(string)
  allowed = 'AGHTM'
  allowed = /\A[\d#{allowed}]+\z/i
  string.match? allowed
end

def eric(string)
  string.delete('AGHTM1-9').empty?
end

def meagar(string)
  allowed = 'AGHTM0123456789'
  string.chars.uniq.all? { |c| allowed.include?(c) }
end

def jesse(string)
  string.count('^AGHTM0-9').zero?
end

def steenslag(string)
  !string.match?(/[^AGHTM0-9]/) 
end

require 'fruity'

n = 1
str1 = 'MT3G22AH' * n
str2 = 'TAR34' * n
compare do
  _jesse { [jesse(str1), jesse(str2)] }
  _eric { [eric(str1), eric(str2)] }
  _mudasobwa { [mudasobwa(str1), mudasobwa(str2)] }
  _meagar { [meagar(str1), meagar(str2)] }
  _steenslag { [steenslag(str1), steenslag(str2)] }
end

It outputs :

Running each test 1024 times. Test will take about 2 seconds.
_steenslag is faster than _jesse by 2.2x ± 0.1
_jesse is faster than _eric by 8.000000000000007% ± 1.0%
_eric is faster than _meagar by 4.3x ± 0.1
_meagar is faster than _mudasobwa by 2.4x ± 0.1

Longer strings

For longer strings ( n=5000), @Jesse becomes the fastest method.

Running each test 32 times. Test will take about 12 seconds.
_jesse is faster than _eric by 2.5x ± 0.01
_eric is faster than _mudasobwa by 4x ± 1.0
_mudasobwa is faster than _steenslag by 2x ± 0.1
_steenslag is faster than _meagar by 11x ± 0.1

The OP says "MT3G22AH" is the string size so looking at longer strings isn't solving the problem. It's useful for other searching uses though. — the Tin Man, Jan 19 '17 at 21:58
`delete` creates another array, just use `count` which supports the same `tr` character sets. — akuhn, Jan 19 '17 at 23:19

score 1 · Accepted Answer · answered Jan 19 '17 at 19:21

1

This seems to be faster than all previous benchmarks (by @Eric Duminil)(ruby 2.4):

!string.match?(/[^AGHTM0-9]/)

answered Jan 19 '17 at 19:21

steenslag

79,051
16
138
171

2

With a longer string, it becomes the second slowest method. – Eric Duminil Jan 19 '17 at 19:34
Regex, especially non-anchored ones, slow down drastically as the string gets longer. – the Tin Man Jan 19 '17 at 21:46
This is from a while back, but the affect of anchoring the pattern applies. http://stackoverflow.com/a/3878656/128421 – the Tin Man Jan 19 '17 at 22:02

Validate that string contains only allowed characters in Ruby

4 Answers4

String#delete

Performance

Short strings

Longer strings

Linked

Related