0

Followed this link to log in with urllib2: Login to website using urllib2 - Python 2.7

What I am trying to do is create a virtual map of storage spaces, to do this I need to log into the management server which is web-based. Hence the BS4 usage and urllib.

Most of the request works fine, except there is a distinct difference between the HTML when the page is loaded by manually logging in(via website) vs when using urllib.

here is a snippet of how it looks when I use the urllib way of logging in:

<div id="gridContainer" class='grid_12'></div>

<form action="/Inventory/UnpendStorageSpaces" method="post"><input name="__RequestVerificationToken" type="hidden" value=">>>>>>BLOCKED VALUE>>>>=" /><input id="deviceKey" name="deviceKey" type="hidden" value="" /><input id="facilityItemKey" name="facilityItemKey" type="hidden" value="" />

here is a snippet of how it looks when i log in manually(via website):

<div id="gridContainer" class="grid_12 gridContainer">
<div class="ui-jqgrid ui-widget ui-widget-content ui-corner-all" id="gbox_gridContainer_grid" dir="ltr" style="width: 940px;">
<div class="ui-widget-overlay jqgrid-overlay" id="lui_gridContainer_grid"></div><div class="loading ui-state-default ui-state-active" id="load_gridContainer_grid" style="display: none;">Loading ...</div>
<div class="ui-jqgrid-view" id="gview_gridContainer_grid" style="width: 940px;"><div class="ui-jqgrid-titlebar ui-widget-header ui-corner-top ui-helper-clearfix" style="display: none;"><a role="link" href="javascript:void(0)" class="ui-jqgrid-titlebar-close HeaderButton" style="right: 0px;"><span class="ui-icon ui-icon-circle-triangle-n"></span></a><span class="ui-jqgrid-title">

As you can tell there is a distinct difference between the two, just wanted to see what the best way to go about this would be?

Community
  • 1
  • 1
hpca01
  • 370
  • 4
  • 15

1 Answers1

1

You're getting a different result because the site uses javascript and JQuery to render the page, which won't happen when fetching the page via urllib.

Check out the first answer to "Web-scraping JavaScript page with Python" for an in-Python solution. Other options include using Selenium or Phantomjs, but hopefully you won't have to.

Community
  • 1
  • 1
Hugo
  • 582
  • 4
  • 10
  • sorry had a dumb moment there haha, figured out how to scrape dynamic content...basically have to use a headless webkit browser. – hpca01 Feb 27 '17 at 23:59