1

I'm new to Python and I'm trying to use BeautifulSoup to parse a HTML page and extract some of the content. The problem I have is that the URL I need to parse is dynamic so I can't hard code it into urllib2.urlopen like all the examples of BeautifulSoup show.

I was trying to extract the current URL from the browser using SELF but I couldn't get this to work. Can anyone post an example of how I can extract the current URL from the browser using SELF, or how I can attach BeautifulSoup to the current URL?

Any help would be greatly appreciated.

Here's my code so far:

import os
import time

import win32api
import win32com.client
import win32con

from pywinauto import application

class A(object):
  def __init__(self):
    self.x = self.request.url

  def method_a(self):
    print self.x

#start IE with a start URL of what was passed in
app = application.Application()
app.Start(r"c:\program files\internet explorer\iexplore.exe %s"% "http://www.cyclestreets.net/journey")
time.sleep(3)
#ie = app.window_(title_re = "CycleStreets Cycle journey planner")
ie = app.window_(title_re = ".*CycleStreets.*")

a = A()
a.method_a()

When I run this I get a message saying AttributeError: 'A' object has no attribute 'request'

markp3rry
  • 724
  • 10
  • 26
  • Where's your URL coming from? Show us your current code and someone might be able to help you out... Is the dynamic URL part of the `BeautifulSoup` parsed HTML page? – cfedermann Apr 17 '12 at 08:45
  • Struggling to post the code into the comment (the backticks don't format it very well) so I'll add it into an answer below. – markp3rry Apr 17 '12 at 09:16
  • Right, OK - new to StackOverflow too :) – markp3rry Apr 17 '12 at 09:19
  • What exactly is `class A` supposed to be? It does _not_ have an attribute `request` which explains your `AttributeError`. Please clarify what you intend to do. – cfedermann Apr 17 '12 at 10:00
  • I'm trying to get the current URL from the browser. There's a highly rated answer on another SO thread [link](http://stackoverflow.com/questions/2764586/get-current-url-in-python) stating that I can use self.request.url but I'm clearly missing something. – markp3rry Apr 17 '12 at 10:23
  • The other SO thread is using Google's `appengine` which has a `self.request.url` attribute; in your code, it is not available, however. – cfedermann Apr 17 '12 at 10:27

2 Answers2

1

You can get current url with urllib see the example below:

from urllib import request,response
url = "http://www.example.com"
response=request.Request(url,headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.71 Safari/537.36'})
print(response.get_full_url())

This may help you!....

0

Think you've gotten a little confused. In your class 'A' you have this:

class A(object):
  def __init__(self):
    self.x = self.request.url

In which you are setting the value of x, in your init function to self.request.url. This is then complaining, as self.request does not exist at this time in your object.

Ash
  • 124
  • 6
  • I'm still no closer to understanding how I can get the current URL in a Python script - can anyone help? – markp3rry Apr 19 '12 at 12:53