24

I'm logged into a webpage/servlet using Mechanize.

I have a page object:

jobShortListPg = agent.get(addressOfPage)

When I use:

puts jobShortListPg

I get the "mechanized" version of the page which I don't want:

#<Mechanize::Page::Link "Home" "blahICScriptProgramName=WEBLIB_MENU.ISCRIPT3.FieldFormula.IScript_DrillDown&target=main0&Level=0&RL=&navc=3171">

How do I get the HTML source of the page instead?

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
Waley Chen
  • 929
  • 3
  • 10
  • 23

3 Answers3

35

Use .body:

puts jobShortListPg.body
the Tin Man
  • 158,662
  • 42
  • 215
  • 303
Dogbert
  • 212,659
  • 41
  • 396
  • 397
2

Use the content method of the page object.

jobShortListPg.content
the Tin Man
  • 158,662
  • 42
  • 215
  • 303
ramasamyz
  • 43
  • 3
0

In Nokogiri use to_s or to_html on the main document Node:

require 'nokogiri'

doc = Nokogiri::HTML(<<EOT)
<html>
  <head></head>
  <body>foo</body>
</html>
EOT

doc.to_html
# => "<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\" \"http://www.w3.org/TR/REC-html40/loose.dtd\">\n" +
#    "<html>\n" +
#    "  <head><meta http-equiv=\"Content-Type\" content=\"text/html; charset=UTF-8\"></head>\n" +
#    "  <body>foo</body>\n" +
#    "</html>\n"

or:

doc.to_s
# => "<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\" \"http://www.w3.org/TR/REC-html40/loose.dtd\">\n" +
#    "<html>\n" +
#    "  <head><meta http-equiv=\"Content-Type\" content=\"text/html; charset=UTF-8\"></head>\n" +
#    "  <body>foo</body>\n" +
#    "</html>\n"

If it distracts you to see the embedded new-lines, this might help:

puts doc.to_s

# >> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
# >> <html>
# >>   <head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"></head>
# >>   <body>foo</body>
# >> </html>
the Tin Man
  • 158,662
  • 42
  • 215
  • 303