0

I have a script that inserts URLs into existing XHTML pages. The URLs have tracking codes with ampersands, and Nokogiri automatically replaces them with the escaped version &. I understand why, but the escaped URL means that the tracking doesn't work, as the tracking code has been changed.

I've checked out How to save unescaped & in nokogiri xml?, How can i put a string with an ampersand in an xml file with Nokogiri?, and Preventing Nokogiri from escaping characters?, but I'm not quite sure how using the builder or using cdata works in the context of what I'm trying to do.

Here's a simplified version of what I am currently doing (with main_link being pulled from an external source):

doc = Nokogiri::XML(open("file.xhtml"))
link = doc.css("a")[0] # the actual file may contain multiple links, not just one
main_link = "http://www.url.com/"
tag = "?blah&blah=blahblah"
link["href"] = main_link + tag
new_content = doc.to_xml
File.open("new_file.xhtml", "w") { |f| f.write(new_content) }

#=> <a href="http://www.url.com/?blah&amp;blah=blahblah">link</a>

I've done this, which works:

content = File.read("file.xhtml")
content.gsub!("&amp;","&")
File.open("updated_file.xhtml", 'w') { |file| file.write(content) }

#=> <a href="http://www.url.com/?blah&blah=blahblah">link</a>

but I'd like to avoid reopening/resaving files, since I'm working with a lot at one time and want to be as efficient as possible.

Is this doable with Nokogiri? Should I be looking elsewhere to accomplish this?

lumos
  • 161
  • 12
  • If you want to avoid reopening/resaving files, why not put the `gsub!` code in the original script (`new_content.gsub!(...)`)? – Jordan Running Feb 06 '19 at 20:39
  • @JordanRunning yeah that's actually what I was doing in my code (I displayed it differently in my question just to make it clearer), but I just wasn't sure if there was a better way that avoided `gsub` altogether – lumos Feb 06 '19 at 20:55

0 Answers0