2

I'm trying to grab the table out of this webpage. I'm not sure if I'm grabbing the right tags. Here is what I have so far.

from bs4 import BeautifulSoup
import requests

page='http://www.airchina.com.cn/www/en/html/index/ir/traffic/'

r=requests.get(page)

soup=BeautifulSoup(r.text)

test=soup.findAll('div', {'class': 'main noneBg'})
rows=test.findAll("td")

Is main noneBg the table? when i hover over that tag, it does highlight the table?

jason
  • 3,811
  • 18
  • 92
  • 147

1 Answers1

2

The table you need is in the iframe that is loaded from a different URL.

Here's how you can grab it (watch the URL is different):

from bs4 import BeautifulSoup
import requests

page = 'http://www.airchina.com.cn/www/jsp/airlines_operating_data/exlshow_en.jsp'

r = requests.get(page)

soup = BeautifulSoup(r.text)

div = soup.find('div', class_='mainRight').find_all('div')[1]
table = div.find('table', recursive=False)
for row in table.find_all('tr', recursive=False):
    for cell in row('td', recursive=False):
        print cell.text.strip()

prints:

Feb 2014
% change vs Feb 2013
% change vs Jan 2014
Cumulative Feb 2014
% cumulative change
1.Traffic
1.RTKs (in millions)
1407.8
...

Note that you need to use recursive=False due to the nested tables on the page.

alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
  • `print cell.text UnicodeEncodeError: 'gbk' codec can't encode character u'\xa0' in position 3: illegal multibyte sequence` getting this error on the last line. – jason Apr 02 '14 at 13:38
  • I'm sorry i'm a beginner. What would the code look like? `cell.text.decode('utf-8').split()` ? – jason Apr 02 '14 at 13:53
  • `return codecs.utf_8_decode(input, errors, True) UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 3: ordinal not in range(128)` looks like similar issue – jason Apr 03 '14 at 01:37
  • ha. ok. that will take me a while to figure out. i'll accept your answer first. – jason Apr 03 '14 at 01:40
  • why does your code work for you but not for me? different version of python? – jason Apr 03 '14 at 02:10
  • Bravo!!! worked. your last comment fixed it. Thank you! But why does the code work for you without that line? i'm curious to know. I'm running python 2.7. – jason Apr 03 '14 at 02:24