1

i am a ruby beginner and i found a problem, i would like to know if there is a more 'ruby way' to solve it.

my problem is: i got a string, like this:

 str = "<div class=\"yui-u first\">\r\n\t\t\t\t\t<h1>Jonathan Doe</h1>\r\n
 \t\t\t\t\t<h2>Web   Designer, Director</h2>\r\n\t\t\t\t</div>"

 # now, i want to replace the substring in <h1> </h1> and <h2> and </h2> with 
 these two string:"fooo" and "barr".

here is what i did:

# first, i got the exactly matched substrings of str:
r = str.scan(/(?<=<h\d>).*?(?=<\/h\d>)/)
# then, i create a hash table to set the corresponding replace strings
h = {r[0] => 'fooo', r[1] => 'barr'}
# finally, using str.gsub to replace those matched strings
str.gsub!(/(?<=<h\d>).*?(?=<\/h\d>)/, h)
# or like this
str.gsub!(/(?<=<h\d>).*?(?=<\/h\d>)/) {|v| h[v]}

PS: The substring in <h1> </h1> and <h2> </h2> are not fixed, so i have to get these strings FIRST, so that i can build a hash table. But I really don't like the code above (because i wrote two lines almost the same), i think there must be a elegant way to do so. i have tried something like this:

str.gsub!(/(?<=<h\d>).*?(?=<\/h\d>)/) { ['fooo', 'barr'].each {|v| v}}

but this didn't work. because this block returns ['fooo', 'barr'] EVERYTIME! if there is a way to let this block (or something?) return one element at a time(return 'fooo' at the first time, then return 'barr' at the second), my problem will be solved! thank you!

sunus
  • 838
  • 9
  • 11
  • Why do you have HTML in a string? Is there an HTML document you're parsing? If so, it would be better to use an HTML parser like Nokogiri. – Mark Thomas Mar 02 '12 at 16:39
  • i think this is a common case :) it really doesn't matter if this come from a html or something else – sunus Mar 02 '12 at 16:42
  • Actually, it matters quite a bit. You can save a lot of time and effort using an HTML parser if that is an option. And the result would be more robust. – Mark Thomas Mar 02 '12 at 16:46
  • @MarkThomas yeah, i got ur point. but still, i just wanna know how to 'multiple substitutions where each gets a different value' :):) Now i know how convenien it's by using html paser:) thank you! – sunus Mar 02 '12 at 16:53

1 Answers1

1

Although you really have no business parsing HTML with a regexp, as a library like Nokogiri can make this significantly easier as you can modify the DOM directly, the mistake you're making is in presuming that the iterator will execute only once per substitution and that the block will return only one value. each will actually return the object being iterated.

Here's a way to avoid all the Regexp insanity:

require 'rubygems'
gem 'nokogiri'
require 'nokogiri'

str = "<div class=\"yui-u first\">\r\n\t\t\t\t\t<h1>Jonathan Doe</h1>\r\n
 \t\t\t\t\t<h2>Web   Designer, Director</h2>\r\n\t\t\t\t</div>"

html = Nokogiri::HTML(str)

h1 = html.at_css('h1')
h1.content = 'foo'

h2 = html.at_css('h2')
h2.content = 'bar'

puts html.to_s

If you want to do multiple substitutions where each gets a different value, the simple way is to just rip off values from a stack:

subs = %w[ foo bar baz ]

string = "x x x"

string.gsub!(/x/) do |s|
  subs.shift
end

puts string.inspect
# => "foo bar baz"

Keep in mind that subs is consumed here. A more efficient approach would be to increment some kind of index variable and use that value instead, but this is a trivial modification.

Community
  • 1
  • 1
tadman
  • 208,517
  • 23
  • 234
  • 262
  • wow !! really really love ur answer, the one using a stack. i think that's what i've been looking for all night. thank you so so so much! – sunus Mar 02 '12 at 16:51