0

I need to replace a large set of broken HTML links in a file. For that, I'd need to do a find/replace disabling any kind of regular expression- i.e. the kind of basic Find/Replace you would do from your notepad. I came across to a Ruby script which should do exactly that:

ruby -p -i -e "gsub('<a href=\"index.php?option=com_content&amp;view=article&amp;id=130&amp;catid=111&amp;Itemid=324\">Home</a>', 'NEWLINK')" test.txt

However, the file test.txt is not changed, nor an output is returned. (I don't know much about ruby so I might be just missing something obvious) Is there any other tool which does what I need?

Edit: I'd expect that the following test.txt file:

<a href=\"index.php?option=com_content&amp;view=article&amp;id=130&amp;catid=111&amp;Itemid=324\">Home</a>

....is changed to:

NEWLINK

Thanks

Carla
  • 3,064
  • 8
  • 36
  • 65
  • Could you please post more clearly sample of input and expected output in your question for better understanding of question, thank you. – RavinderSingh13 Jul 07 '21 at 13:09
  • Mandatory [don't parse HTML with regex](https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) link. – glenn jackman Jul 07 '21 at 14:55
  • Well that's weird. Any tool like "gedit" can do a find/replace of the above HTML String. Is there any shell/language which is capable to do the same thing? – Carla Jul 07 '21 at 16:17
  • I think what @glennjackman is trying to tell you is to use a HTML parser such as Nokogiri instead of a regex. – max Jul 08 '21 at 01:15

1 Answers1

0

Instead of a regular expression consider using a HTML parser which actually understands HTML and won't leave you with a broken HTML document.

# link_parser.rb
require 'bundler/inline'

gemfile do
  source 'https://rubygems.org'
  gem 'nokogiri'
end

fn = ARGV[0]
if File.exist(fn)
  puts "Processing #{fn}..."
  File.open(fn, 'rw') do |file|
    doc = Nokogiri::HTML(file)
    links = doc.css('a[href="index.php?option=com_content&amp;view=article&amp;id=130&amp;catid=111&amp;Itemid=324"]')
    if links.any?
      links.each do |link|
        link.href = "NEWLINK"
      end
      file.rewind
      file.write(doc.to_s)
      puts "#{links.length} links replaced" 
    else
      puts "No links found" 
    end
  end
else
  puts "File not found."
end
ruby link_parser.rb path/to/file.html
max
  • 96,212
  • 14
  • 104
  • 165