12

There is a list of words and list of banned words. I want to go through the word list and redact all the banned words. This is what I ended up doing (notice the catched boolean):

puts "Give input text:"
text = gets.chomp
puts "Give redacted word:"
redacted = gets.chomp

words = text.split(" ")
redacted = redacted.split(" ")
catched = false

words.each do |word|
  redacted.each do |redacted_word|
    if word == redacted_word
        catched = true
        print "REDACTED "
        break
    end
  end
    if catched == true
        catched = false
    else
        print word + " "
    end
end

Is there any proper/efficient way?

potashin
  • 44,205
  • 11
  • 83
  • 107
Mikko Vedru
  • 333
  • 2
  • 11

3 Answers3

20

It also can works.

words - redacted

+, -, &, these methods are very simple and useful.

irb(main):016:0> words = ["a", "b", "a", "c"]
=> ["a", "b", "a", "c"]
irb(main):017:0> redacted = ["a", "b"]
=> ["a", "b"]
irb(main):018:0> words - redacted
=> ["c"]
irb(main):019:0> words + redacted
=> ["a", "b", "a", "c", "a", "b"]
irb(main):020:0> words & redacted
=> ["a", "b"]
pangpang
  • 8,581
  • 11
  • 60
  • 96
  • The only problem is that this isn't very flexible. If you needed to make it case-insensitive for example, you'd have to switch to one of the other solutions. – Mark Thomas May 05 '15 at 19:28
16

You can use .reject to exclude all banned words that are present in the redacted array:

words.reject {|w| redacted.include? w}

Demo

If you want to get the list of banned words that are present in the words array, you can use .select:

words.select {|w| redacted.include? w}

Demo

potashin
  • 44,205
  • 11
  • 83
  • 107
1

This might be a bit more 'elegant'. Whether it's more or less efficient than your solution, I don't know.

puts "Give input text:"
original_text = gets.chomp
puts "Give redacted word:"
redacted = gets.chomp

redacted_words = redacted.split

print(
  redacted_words.inject(original_text) do |text, redacted_word|
    text.gsub(/\b#{redacted_word}\b/, 'REDACTED')
  end
)

So what's going on here?

  • I'm using String#split without an argument, because ' ' is the default, anyway.
  • With Array#inject, the following block (staring at do and ending at end is executed for each element in the array—in this case, our list of forbidden words.
    • In each round, the second argument to the block will be the respective element from the array
    • The first argument to the block will be the block's return value from the previous round. For the first round, the argument to the inject function (in our case original_text) will be used.
    • The block's return value from the last round will be used as return value of the inject function.
  • In the block, I replace all occurrences of the currently handled redacted word in the text.
    • String#gsub performs a global substitution
    • As the pattern to be substituted, I use a regexp literal (/.../). Except, it's not really a literal as I'm performing a string substitution (#{...}) on it to get the currently handled redacted word into it.
    • In the regexp, I'm surrounding the word to be redacted with \b word boundary matchers. They match the boundary between alphanumeric and non-alphanumeric characters (or vice verca), without matching any of the characters themselves. (They match the zero-lenght 'position' between the characters.) If a string starts or ends with alphanumeric characters, \b will also match the start or end of the string, respectively, so that we can use it to match whole words.
  • The result of inject (which is the result of the last execution of the block, i.e., the text when all the substitutions have taken place) is passed as an argument to print, which will output the now redacted text.

Note that, other than your solution, mine will not consider punctuation as parts of adjacent words.

Also note that my solution will be vulnerable to regex injection.

Example 1:

Give input text:
A fnord is a fnord.
Give redacted word:
ford fnord foo

My output:

A REDACTED is a REDACTED.

Your output:

A REDACTED is a fnord.

Example 2:

Give input text:
A fnord is a fnord.
Give redacted word:
fnord.

My output:

A REDACTEDis a fnord.

(Note how the . was interpreted to match any character.)

Your output:

A fnord is a REDACTED.
das-g
  • 9,718
  • 4
  • 38
  • 80