I am trying to parse a website.This is what Im doing I download the source and traverse the data using nokogiri and get the information I needed like links, content, etc. I already have the script for getting the data. But I stumbled a problem when the link only works when you click on it on a live site.
This is the example source I'm trying to traverse.
<div class="story-item-content group">
<div class="story-item-details">
<h3 class="story-item-title">
<a href="/story/r/how_not_to_fix_your_computer_part_2" target="_blank" class="external-link ">How NOT to fix your computer, part 2.</a>
<span class="external-link-icon"></span>
</h3>
<p class="story-item-description">
<a href="/search?q=site:zug.com" class="story-item-source" title="More stories from zug.com">zug.com</a> <a href="/news/technology/how_not_to_fix_your_computer_part_2" class="story-item-teaser">— After you read this you should understand what not to do.
<span class="timestamp">21 hr 59 min ago</span></a>
<a class="crawl4link" href="http://crawl4.digg.internal/permalink/view/how_not_to_fix_your_computer_part_2">View in Crawl 4</a>
</p>
</div>
So in line 4. the link href="/story/r/how_not_to_fix_your_computer_part_2
only works in a live site. When I download the source and click the link. It won't work. I'm guessing the link is save in the server. Any idea how do i get the full link?. I was thinking of having a script that clicks that link, in that way I can get the working link. Any idea how to do this? thnx