2

I'm writing a command-line development utility for my team, using Ruby.

I'm trying to inspect an HTML document on the filesystem and add new <script> tag before the </head>

Something like:

<html>
  <head>
    <script src="...foo.js"></script>
    <script src="...bar.js"></script>
    <!-- I WANT TO INSERT NEW TEXT HERE -->
  </head>
  <body>
  </body>
</html>

I was thinking of starting with IO.readlines(file_name), comparing each line to a regex, and inserting my new tag ahead of the </head>. Then, merge the whole array back into a new version of the file.

This sounds overly complicated. Who's got a better way?

For bonus points, it would be great to have the right level of indentation.

Matt H.
  • 10,438
  • 9
  • 45
  • 62

2 Answers2

5

HTML and regex is (very often) not a good idea.

If you want to parse and modify HTML with Ruby in a clean way I recommend using Nokogiri.

http://nokogiri.org/

http://nokogiri.org/tutorials

moodywoody
  • 2,149
  • 1
  • 17
  • 21
1

In your special case it's not such a bad idea. If you have that special line in the file, you can easily find it, extract the necessary amount of indentation from the beginning of the line, and replace the whole line with another content. But don't do it in memory. You can write it to a temporary file while reading the source file, you don't need to eat up the RAM.

If the HTML comment was just an example, and that line isn't there, you still can replace the first occurance of </head> with <script>...</script></head> with a regexp, don't need to parse HTML. (But this is only true in your special case.) To be frank, you don't need Ruby either, because a sed command is perfect for this job.

Or if you mast do other checks, for example that the script is already there or not, then use any HTML parser lib/gem. I suggest you hpricot, if you like the concept of jQuery, because hpricot has a very similar approach.

HTH

Sandor Bedo
  • 302
  • 2
  • 4