1

I want to parse email addresses which are rendered this way:

<p class="email">
"Email: "
<script type="text/javascript"><!--
 document.write('f'+'o'+'<wbr/>@'+'e'+'x'+'p'+'.'+'c'+'o'); //-->
</script>
</p>

I'm using this code:

task import_emails: :environment do
  require 'mechanize'
  agent = Mechanize.new
  agent.get("URL")
  agent.page.search(".email").each do |email|
    puts email.text.strip
  end
end

It only returns "Email: ".

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
Ilya Cherevkov
  • 1,743
  • 2
  • 17
  • 47

1 Answers1

0

Nokogiri/Mechanize do not handle JavaScript, so the email text that you add with document.write is not visible to them and that's why you only get Email: back.

If you want to select elements or text that are rendered using JavaScript, you'll have to consider tools that actually drive a browser, so that you get the page rendered with JavaScript. One example would be Watir. Also, take a look at Capybara and Capybara webkit.

See "How do I use Mechanize to process JavaScript?" for more details.

Community
  • 1
  • 1
Chris Salzberg
  • 27,099
  • 4
  • 75
  • 82