0

I am working in Ruby on Rails application, How can I remove specific html tag with some attribute as shown as below :-

post = Post.find(1646).content

 => "<p>this is just another update</p><p data-f-id=\"pbf\" style=\"text-align: center; font-size: 14px; margin-top: 30px; opacity: 0.65; font-family: sans-serif;\">Powered by <a href=\"any href link" title=\"xyz\">remove it</a></p>"

I have to totally remove this below paragraph from above content:-

<p data-f-id=\"pbf\" style=\"text-align: center; font-size: 14px; margin-top: 30px; opacity: 0.65; font-family: sans-serif;\">Powered by <a href=\"any href link" title=\"xyz\">remove it</a></p>

How can I identify this <p data-f-id=\"pbf\" with using regex or something else. Any help would be appreciated. :)

code_aks
  • 1,972
  • 1
  • 12
  • 28

1 Answers1

2

Don't use regex to parse HTML. Use an HTML parser instead.

There are several popular HTML parsing libraries in ruby. Here is one way to do it, using Nokogiri:

post = Post.find(1646).content
document = Nokogiri::HTML::DocumentFragment.parse post
document.css('p[data-f-id=pbf]').remove

document.to_s
  #=> "<p>this is just another update</p>"
Tom Lord
  • 27,404
  • 4
  • 50
  • 77