0

I need help figuring out a programming problem that I've been working on.

The problem description:

Write a function in Ruby that accepts an HTML document (a string) and a keyword (also a string). The function will find all occurrences of the keyword in the HTML string after the <body> element unless the keyword appears within an HTML tag, then surround the string found with tags to ``highlight’’ the keyword. For example,

<span style="background-color: blue; color: white">keyword</span>

You will have to be careful not to highlight strings occurring within an HTML tag. For example, if the keyword is ``table’’, you wouldn’t want to mark up this:

<table width="100%" border="0">

What I have done so far:

puts "Welcome to the HTML keyword highlighter!"
puts "Please Enter A Keyword: "
keyword = gets.chomp
canEdit = false 

infile = File.new("desktop/code.html", "r")
outfile = File.new("Result.html", "w")

infile.each{ |i| 
    if (i.include? "<body>")
        canEdit = true

    end

    if (i.include? "</body>")
        canEdit = false
    end

    if(canEdit == true)
        keyword.gsub(keyword, "<span style=\"background-color: yellow; color: black\">#{keyword}</span>")

    outfile.write i
end

outfile.close()
infile.close()
}

The error I receive currently:

Welcome to the HTML keyword highlighter!

Please Enter A Keyword:

simple

/Users/Eva/Desktop/Personal/part4_program.rb:16:in `each': closed stream (IOError)

from /Users/Eva/Desktop/Personal/part4_program.rb:16:in `<main>'

I'm unsure what is causing the error and could use some guidance to fix the issue. I am also wondering if this program is heading in the right direction as an answer to the programming problem. I know Nokogiri is already available as a resource but I had hoped not to have to use it unless its thought to be a better option.

Mr Lister
  • 45,515
  • 15
  • 108
  • 150
mm19
  • 87
  • 1
  • 10
  • 1
    It's a better option. http://stackoverflow.com/a/1732454/438992 – Dave Newton Nov 02 '16 at 16:28
  • Why is it better though? I have tried installing it with no success. I get this error everytime I try to install it: ERROR: While executing gem ... (Gem::FilePermissionError) You don't have write permissions for the /Library/Ruby/Gems/2.0.0 directory. – mm19 Nov 02 '16 at 16:30
  • YMMV, but this might help with installation: http://stackoverflow.com/questions/14607193/installing-gem-or-updating-rubygems-fails-with-permissions-error – orde Nov 02 '16 at 17:16
  • You should take the 3 seconds it takes to properly format and most importantly indent your code. Not only is it impolite to ask people for help when you are not even willing to spend the 3 seconds it takes to make your code readable, but in this specific case, the error will be immediately obvious, and you wouldn't even have had to ask the question in the first place. – Jörg W Mittag Nov 02 '16 at 18:05
  • I don't understand how I've formatted it incorrectly. It's formatted the same way that I have it in my original code. – mm19 Nov 02 '16 at 18:19
  • It's better because parsing HTML is hard, and it's really easy to do it wrong. – Dave Newton Nov 02 '16 at 18:22

1 Answers1

2

I'm unsure what is causing the error and could use some guidance to fix the issue.

Let's first apply some proper formatting to your code, to see more clearly what is going on:

puts 'Welcome to the HTML keyword highlighter!'
puts 'Please Enter A Keyword: '
keyword = gets.chomp
can_edit = false 

infile = File.new('desktop/code.html', 'r')
outfile = File.new('Result.html', 'w')

infile.each {|i| 
  if i.include?('<body>')
    can_edit = true
  end

  if i.include?('</body>')
    can_edit = false
  end

  if can_edit
    keyword.gsub(keyword, %Q[<span style="background-color: yellow; color: black">#{keyword}</span>])
    outfile.write i
  end

  outfile.close
  infile.close
}

The error message says:

    part4_program.rb:16:in `each': closed stream (IOError)

So, what is happening is that you try to iterate using each over a closed file. And why is that? Well, now that the code is indented properly, we can easily see that you close both infile and outfile inside of the each iterator. This will lead to all sorts of problems:

  • You close the file while each is still iterating over it. This will "pull the rug out under each's feet", so to speak. How can it iterate over the file when the file is closed? You should be lucky that each detects this and you got a nice error message and a clean exit – closing out the file out from under the iterator that is currently reading it, may have led to much subtler and harder to diagnose problems.
  • Even if each didn't break because you closed the file out from unter it, you still call close every time you go through the iteration, but you can only close a file once, after that it is already closed and can't be closed again.
  • And even if you could close files multiple times, you write to outfile, but you already closed it during the previous iteration. You can't write to a closed file.

I am also wondering if this program is heading in the right direction as an answer to the programming problem.

Honestly, I don't even remotely understand what you are trying to do. But I am going to say "No", you are not heading in the right direction.

Here are just a couple of simple ways to break your code:

  • what if the keyword is table?
  • what if <body> and </body> are on the same line?
  • what if the keyword appears on the line as <body> but before it?
  • what if someone spells it <BODY> or <bOdY> instead?
  • what about optional tags?
  • what about Null End Tags?
  • what if the keyword appears inside a comment?
  • what if the keyword appears inside a tag?
  • what if the keyword appears inside an attribute?
  • what if the keyword appears inside a <script> element?
  • what if the keyword appears inside a <style> element?
  • what if the keyword appears inside a <![CDATA[ section?

I know Nokogiri is already available as a resource but I had hoped not to have to use it unless its thought to be a better option.

HTML is complex. Really complex. Really, really complex. Unless you have some very good reasons to re-invent the wheel, you should re-use the work someone else has already done. Without even thinking too hard, I could come up with more than half a dozen ways to break your parser, and I didn't even get into the nasty corner cases. (Simply because I don't know the nasty corner cases, because I don't need to know them, because somebody else has already figured them all out.)

The two fundamentals of Programming are Abstraction and Reuse. Creating Reusable Abstractions and Reusing other programmer's Abstractions.

Jörg W Mittag
  • 363,080
  • 75
  • 446
  • 653
  • Thank you for answering my question. I appreciate the time you took to go through everything with me. If you believe I cannot achieve this with my own code I will certainly use Nokogiri. Thanks again. – mm19 Nov 03 '16 at 14:48