144

I'm working with Ruby on Rails, Is there a way to strip html from a string using sanitize or equal method and keep only text inside value attribute on input tag?

Castiblanco
  • 1,200
  • 4
  • 13
  • 32
Mattias
  • 1,461
  • 2
  • 9
  • 5

9 Answers9

206

If we want to use this in model

ActionView::Base.full_sanitizer.sanitize(html_string)

which is the code in "strip_tags" method

Jon
  • 2,703
  • 3
  • 18
  • 14
  • 32
    This works but referring to ActionView from the mdoel is awkward. More cleanly you can `require 'html/sanitizer'` and instantiate your own sanitizer with `HTML::FullSanitizer.new`. – Nik Haldimann Jan 08 '13 at 20:49
  • 10
    @nhaldimann, `require 'html/sanitizer'` raises error so I have to use: `Rails::Html::FullSanitizer.new` (http://edgeapi.rubyonrails.org/classes/HTML/FullSanitizer.html#method-i-sanitize) – Linh Dam Jul 04 '16 at 10:46
  • 1
    I'm using `Rails::Html::FullSanitizer.new.sanitize(string)` with Rails 7 – dostu Jun 09 '23 at 15:02
149

There's a strip_tags method in ActionView::Helpers::SanitizeHelper:

http://api.rubyonrails.org/classes/ActionView/Helpers/SanitizeHelper.html#method-i-strip_tags

Edit: for getting the text inside the value attribute, you could use something like Nokogiri with an Xpath expression to get that out of the string.

Michael Kohl
  • 66,324
  • 14
  • 138
  • 158
35
ActionView::Base.full_sanitizer.sanitize(html_string)

White list of tags and attributes can be specified as bellow

ActionView::Base.full_sanitizer.sanitize(html_string, :tags => %w(img br p), :attributes => %w(src style))

Above statement allows tags img, br and p and attributes src and style.

Satishakumar Awati
  • 3,604
  • 1
  • 29
  • 50
34

Yes, call this: sanitize(html_string, tags:[])

Abram
  • 39,950
  • 26
  • 134
  • 184
bcackerman
  • 1,486
  • 2
  • 22
  • 36
10

I've used the Loofah library, as it is suitable for both HTML and XML (both documents and string fragments). It is the engine behind the html sanitizer gem. I'm simply pasting the code example to show how simple it is to use.

Loofah Gem

unsafe_html = "ohai! <div>div is safe</div> <script>but script is not</script>"

doc = Loofah.fragment(unsafe_html).scrub!(:strip)
doc.to_s    # => "ohai! <div>div is safe</div> "
doc.text    # => "ohai! div is safe "
taiar
  • 552
  • 6
  • 22
Krishna Vedula
  • 1,643
  • 1
  • 27
  • 31
3

If you want to remove all html tags you can use

   htm.gsub(/<[^>]*>/,'')
s1mpl3
  • 1,456
  • 1
  • 10
  • 14
2

How about this?

white_list_sanitizer = Rails::Html::WhiteListSanitizer.new
WHITELIST = ['p','b','h1','h2','h3','h4','h5','h6','li','ul','ol','small','i','u']


[Your, Models, Here].each do |klass| 
  klass.all.each do |ob| 
    klass.attribute_names.each do |attrs|
      if ob.send(attrs).is_a? String
        ob.send("#{attrs}=", white_list_sanitizer.sanitize(ob.send(attrs), tags: WHITELIST, attributes: %w(id style)).gsub(/<p>\s*<\/p>\r\n/im, ''))
        ob.save
      end
    end
  end
end
josetapadas
  • 2,589
  • 3
  • 19
  • 19
0

This is working for me in rails 6.1.3:

.errors-description
  = sanitize(message, tags: %w[div span strong], attributes: %w[class])
Sergio Belevskij
  • 2,478
  • 25
  • 24
0

If your HTML is coming from ActionText, you can do .to_plain_text:

@my_string = <p>My HTML String</p>
@my_string.to_plain_text
=> My HTML String

https://www.rubydoc.info/github/rails/rails/ActionText%2FContent:to_plain_text

Joshua Pinter
  • 45,245
  • 23
  • 243
  • 245
drjorgepolanco
  • 7,479
  • 5
  • 46
  • 47