1

I've decided to take a swing at web scraping using Python (with lxml and requests). The webpage I'm trying to scrape to learn is: http://www.football-lineups.com/season/Real_Madrid/2013-2014

What I want to scrape is the table on the left of the webpage (the table with the scores and formations used). Here is the code I'm working with:

from lxml import html
import requests
page=requests.get("http://www.football-lineups.com/season/Real_Madrid/2013-2014")
tree=html.fromstring(page.text)
competition=tree.xpath('//*[@id="sptf"]/table/tbody/tr[2]/td[4]/font/text()')
print competition

The xpath that I input is the xpath that I copied over from Chrome. The code should normally return the competition of the first match in the table (i.e. La Liga). In other words, it should return the second row, fourth column entry (there is a random second column on the web layout, I don't know why). However, when I run the code, I get back an empty list. Where might this code be going wrong?

  • possible duplicate of [Why do browsers insert tbody element into table elements?](http://stackoverflow.com/questions/938083/why-do-browsers-insert-tbody-element-into-table-elements) – roippi Jun 11 '14 at 13:11
  • See the dupe. The `tbody` you see in your browser's dev tools is implicitly included in the DOM, but is not actually in the scraped source. – roippi Jun 11 '14 at 13:12
  • Thank you, I'm looking through it. I tried deleting "tbody" from the xpath and running the code, but I still got an empty list. –  Jun 11 '14 at 13:20
  • @roippi forgot to mention you –  Jun 11 '14 at 13:30

2 Answers2

1

If you inspect the row source of the page you will see that the lineup table is not there. It is fed after loading the page using AJAX so you wont be able to fetch it only by getting http://www.football-lineups.com/season/Real_Madrid/2013-2014 since the JS won't be interpreted and thus the AJAX not executed.

The AJAX request is the following:

Maybe you can forge the request to get this data. I'll let you analyse what are those well named dX arguments :)

Benoît Latinier
  • 2,062
  • 2
  • 24
  • 36
0

Here, I give full code which fulfill your requirement:

from selenium import webdriver
import csv
url="http://www.football-lineups.com/season/Real_Madrid/2013-2014"
driver=webdriver.Chrome('./chromedriver.exe')
driver.get(url)
myfile = open('demo.csv', 'wb')
wr = csv.writer(myfile, quoting=csv.QUOTE_ALL)
tr_list=driver.find_elements_by_xpath("//span[@id='sptf']/table/tbody/tr")
    for tr in tr_list:
    lst=[]
    for td in tr.find_elements_by_tag_name('td'):
        lst.append(td.text)
    wr.writerow(lst)
 driver.quit()
 myfile.close()
Piyush
  • 511
  • 4
  • 13