1

I'm trying to get my rails app to fetch the HTML source of a web page.

I want to get all of the HTML from a URI like /news_articles/7 into a string.

I tried using something like Nokogiri but it seems to lock mutex.

The purpose for this is to send a string of HTML to Amazon's SES.

Thanks

aynber
  • 22,380
  • 8
  • 50
  • 63
Valkyrie0512
  • 119
  • 7
  • 1) I cannot parse your **... get all of the HTML a URI ...**. 2) What is Nokugiri? 3) **It seem** => It seems 4) **reason** => purpose – sawa Mar 26 '14 at 14:51
  • You can just go to your app, right-click and `view source`? – Richard Peck Mar 26 '14 at 14:51
  • 1
    You probably want to use something like `ActionMailer` with SES, instead of trying to render a page into a string. http://stackoverflow.com/questions/4798437/using-amazon-ses-with-rails-actionmailer – Casper Mar 26 '14 at 14:52

1 Answers1

2

Nokogiri in combination with Mechanize will serve you well.

Gemfile

gem 'nokogiri'
gem 'mechanize'

controller

agent = Mechanize.new()
# allow the agent to follow redirects
agent.follow_meta_refresh = true
# get the desired page
page = agent.get('http://www.mysite.com/new_articles/7')
# output its html
page.body

Possible Duplicate

Community
  • 1
  • 1
davegson
  • 8,205
  • 4
  • 51
  • 71
  • I think he's trying to run this from within a Rails request cycle. Hence he will deadlock the whole Rails app. That's his main problem. – Casper Mar 26 '14 at 15:06
  • Jup that makes more sense, I guess you provided the right link – davegson Mar 26 '14 at 15:08