-1

I have a script that records a span field on a website using the requests module in python.

from lxml import html
import requests

r = requests.get(url)
tree = html.fromstring(r.content)
while 1:
     print str(tree.xpath('//span[@id="ofr"]/text()')

However this span is updating and I am looking to refresh this without reloading the entire page, for which I cannot find a solution. Many thanks

mikarific
  • 13
  • 4
  • requests is not a browser, it does not execute JavaScript that may be included in the page. It can only get the HTML of the page as is is *before* any client side JavaScript is executed. Can you tell, *how* the span is updated? What is happening on the client? –  Dec 09 '16 at 15:39
  • 1
    Thanks Lutz, the span is automatically refreshed, there's no interaction needed. Does that answer your question?Your comment did however point me in the right direction, I believe this is a similar issue: http://stackoverflow.com/questions/8960288/get-page-generated-with-javascript-in-python – mikarific Dec 09 '16 at 15:49
  • No, it does not. Automatically by what mechanism? JavaScript? A `refresh` meta tag? –  Dec 09 '16 at 15:50
  • it's a javascript element – mikarific Dec 09 '16 at 16:03

1 Answers1

0

You need to put the requests.get call in the while 1, otherwise it no new request to the website is made. _lastValue holds the span's value from the last round and the scripts sleeps for one second between each lookups.

from lxml import html
import time
import requests

_lastValue  = None
while 1:
    r = requests.get(url)
    tree = html.fromstring(r.content)
    _currentValue = str(tree.xpath('//span[@id="ofr"]/text()')

    if _currentValue != _lastValue:
        print _currentValue
        _lastValue = _currentValue

    time.sleep(1)
Maurice Meyer
  • 17,279
  • 4
  • 30
  • 47
  • Thanks Maurice, that was my initial approach, however this causes too much latency to reload the entire website. I am trying to read price data from an updating field. – mikarific Dec 09 '16 at 15:35
  • This will not work because the page will probably be changed by client side JavaScript. –  Dec 09 '16 at 15:40
  • You need to reload the entire site, http protocol works like this (except the span changes via xhr/ajax requests) – Maurice Meyer Dec 09 '16 at 15:41