Use nokogiri or mechanize to parse emails, which are rendered using JavaScript

Question

I want to parse email addresses which are rendered this way:

<p class="email">
"Email: "
<script type="text/javascript"><!--
 document.write('f'+'o'+'<wbr/>@'+'e'+'x'+'p'+'.'+'c'+'o'); //-->
</script>
</p>

I'm using this code:

task import_emails: :environment do
  require 'mechanize'
  agent = Mechanize.new
  agent.get("URL")
  agent.page.search(".email").each do |email|
    puts email.text.strip
  end
end

It only returns "Email: ".

score 0 · Accepted Answer · edited May 23 '17 at 10:34

0

Nokogiri/Mechanize do not handle JavaScript, so the email text that you add with document.write is not visible to them and that's why you only get Email: back.

If you want to select elements or text that are rendered using JavaScript, you'll have to consider tools that actually drive a browser, so that you get the page rendered with JavaScript. One example would be Watir. Also, take a look at Capybara and Capybara webkit.

See "How do I use Mechanize to process JavaScript?" for more details.

edited May 23 '17 at 10:34

Community

1
1

answered Jan 24 '13 at 23:53

Chris Salzberg

27,099
4
75
82

Thanks, Watir is good option, I already installed it and it seems I can capture javascript functions – Ilya Cherevkov Jan 24 '13 at 23:59
I added capybara-webkit as well, if you want something that doesn't require a browser actually being open all the time (which can be annoying). – Chris Salzberg Jan 25 '13 at 00:00

Use nokogiri or mechanize to parse emails, which are rendered using JavaScript

1 Answers1