6

What I would like to do (in Clojure):

For example, I have a vector of words that need to be removed:

(def forbidden-words [":)" "the" "." "," " " ...many more...])

... and a vector of strings:

(def strings ["the movie list" "this.is.a.string" "haha :)" ...many more...])

So, each forbidden word should be removed from each string, and the result, in this case, would be: ["movie list" "thisisastring" "haha"].

How to do this ?

Zeljko
  • 71
  • 3
  • Will this link help you: http://github.com/richhickey/clojure-contrib/blob/bacf49256673242bb7ce09b9f5983c27163e5bfc/src/main/clojure/clojure/contrib/string.clj#L162 – ilija veselica Mar 31 '10 at 14:30

3 Answers3

7
(def forbidden-words [":)" "the" "." ","])
(def strings ["the movie list" "this.is.a.string" "haha :)"])
(let [pattern (->> forbidden-words (map #(java.util.regex.Pattern/quote %)) 
                (interpose \|)  (apply str))]
  (map #(.replaceAll % pattern "") strings))
cgrand
  • 7,939
  • 28
  • 32
  • I like this better because it only does a single pass over the input string. – Stuart Sierra Apr 01 '10 at 18:15
  • Regarding your comment below, have you tried out your own answer with ["th:)e"] ? It doesn't work correctly when I try it. – A. Levy Apr 02 '10 at 19:28
  • @ALevy To me, he works as expected: for ["th:)e" ":the)"] it outputs ("the" ":)") removing only the forbidden words that appear in the input string -- and not forbidden words that appear when you have already removed sother forbidden words. My solution is the only one whose return values don't depend on the ordering of the forbidden-words vector. – cgrand Apr 03 '10 at 18:11
  • I like this solution the most because it does not use loops and it's fast. – Zeljko Apr 14 '10 at 23:31
1
(use 'clojure.contrib.str-utils)
(import 'java.util.regex.Pattern)
(def forbidden-words [":)" "the" "." "," " "])
(def strings ["the movie list" "this.is.a.string" "haha :)"])
(def regexes (map #(Pattern/compile % Pattern/LITERAL) forbidden-words))
(for [s strings] (reduce #(re-gsub %2 "" %1) s regexes))
Jouni K. Seppänen
  • 43,139
  • 5
  • 71
  • 100
  • +1, since this works. For those who'd like to test this with on the bleeding edge, note that `clojure.contrib.str-utils` has been renamed to `clojure.contrib.string` in the current sources and `re-gsub` has become `replace-re`. Also note that if removing a word from between two other words should entail removing exactly one of the spaces surrounding it (rather than none, as with the code above) *and* words at the beginning and end of the string were to be handled correctly, then somewhat more involved regex magic would be called for. – Michał Marczyk Mar 31 '10 at 20:24
  • Your call to `Pattern/compile` can be replaced with `re-pattern`. – Brian Carper Mar 31 '10 at 20:41
  • @Brian: `re-pattern` doesn't accept the `Pattern/LITERAL` argument which is necessary here. – Michał Marczyk Mar 31 '10 at 22:32
  • 1
    All multipass answers are faulty, try your solution with the input ["th:)e"]. – cgrand Apr 01 '10 at 21:33
0

Using function composition and the -> macro this can be nice and simple:

(for [s strings] 
  (-> s ((apply comp 
           (for [s forbidden-words] #(.replace %1 s ""))))))

If you want to be more 'idiomatic', you can use replace-str from clojure.contrib.string, instead of #(.replace %1 s "").

No need to use regexs here.

Michiel Borkent
  • 34,228
  • 15
  • 86
  • 149
  • 1
    All multipass answers are inherently broken: (def forbidden-words [":)" "the" "." ","]) (for [s [":the)"]] (-> s ((apply comp (for [s forbidden-words] #(.replace %1 s "")))))) ;; this returns ("") – cgrand Apr 01 '10 at 21:30