2

I am attempting to scrape the following website flow.gassco.no as one of my first python projects. I need to bypass the splash screen which redirects to the main page. I have isolated the following action,

<form method="get" action="acceptDisclaimer">
<input type="submit" value="Accept"/> 
<input type="button" name="decline" value="Decline"  onclick="window.location = 'http://www.gassco.no'" />
</form>

In a browser appending 'acceptDisclaimer?' to the url redirects to the target flow.gassco.no. However if I try to replicate this in urllib, I appear to stay on the same page when outputting the source.

import urllib, urllib2
url="http://flow.gassco.no/acceptDisclaimer?"
url2="http://flow.gassco.no/"
#first pass to invoke disclaimer
req=urllib2.Request(url)
res=urllib2.urlopen(req)
#second pass to access main page
req1=urllib2.Request(url2)
res2=urllib2.urlopen(req1)
data=res2.read()
print data

I suspect that I have oversimplified the problem, but would appreciate any input into how I can accept the disclaimer and continue to output the main page source.

user3080213
  • 39
  • 1
  • 2
  • 10
  • 2
    Is this exactly what your code looks like? Your urls are not quoted strings ("http://flow.gassco.no/") like they should be here. – Totem Dec 08 '13 at 17:44

1 Answers1

0
  1. Use a cookiejar. See python: urllib2 how to send cookie with urlopen request

  2. Open the main url first

  3. Open the /acceptDisclaimer after that

Community
  • 1
  • 1
sureshvv
  • 4,234
  • 1
  • 26
  • 32