1

I am using Python 3 on Windows 7.

However, I am unable to download some of the data listed in the web site as follows:

http://data.tsci.com.cn/stock/00939/STK_Broker.htm

453.IMC 98.28M 18.44M 4.32 5.33 1499.Optiver 70.91M 13.29M 3.12 5.34 7387.花旗环球 52.72M 9.84M 2.32 5.36

When I use Google Chrome and use 'View Page Source', the data does not show up at all. However, when I use 'Inspect', I can able to read the data.

'<th>1453.IMC</th>'
'<td>98.28M</td>'
'<td>18.44M</td>'
'<td>4.32</td>'
'<td>5.33</td>'

'<th>1499.Optiver </th>'
'<td> 70.91M</td>'
'<td>13.29M </td>'
'<td>3.12</td>'
'<td>5.34</td>'

Please kindly explain to me if the data is hide in CSS Style sheet or is there any way to retrieve the data listed.

Thank you

Regards, Crusier

from bs4 import BeautifulSoup
import urllib
import requests




stock_code = ('00939', '0001')

def web_scraper(stock_code):

    broker_url = 'http://data.tsci.com.cn/stock/'
    end_url = '/STK_Broker.htm'

    for code in stock_code:

        new_url  = broker_url + code + end_url
        response = requests.get(new_url)
        html = response.content
        soup = BeautifulSoup(html, "html.parser")
        Buylist = soup.find_all('div', id ="BuyingSeats")
        Selllist = soup.find_all('div', id ="SellSeats")


        print(Buylist)
        print(Selllist)



web_scraper(stock_code)
Hank Pang
  • 9
  • 3
  • 1
    The web page initially loads as a mostly empty, skeleton page and the content is filled using Javascript. Your scraper code is just loading the skeleton and not running the Javascript, and so doesn't see the table you want. This is a very common pattern and I'm sure there's an answer for this on StackOverflow, so I'm commenting until I can find it and link it as a duplicate. – Spacedman Jul 29 '16 at 07:32
  • Perhaps https://stackoverflow.com/questions/2148493/scrape-html-generated-by-javascript-with-python ? – Tiger-222 Jul 29 '16 at 07:40

2 Answers2

0

as someone already mentioned, Selenium is the way to go.

from selenium import webdriver

broker_url = 'http://data.tsci.com.cn/stock/00939/STK_Broker.htm'

mydriver = webdriver.Chrome()
mydriver.get(broker_url)

BuyList = mydriver.find_element_by_css_selector('#Buylist')
rows = BuyList.find_elements_by_tag_name('tr')
for row in rows:
    print(row.text)
pawelty
  • 1,000
  • 8
  • 27
0

The data is dynamically generated but you can mimic an ajax request and get it in json format:

import requests

params = {"Code": "E00939",
          "PkgType": "11036",
          "val": "50"}
js = requests.get("http://data.tsci.com.cn/RDS.aspx", params=params).json()

print(js)

That gives you the table data like:

{u'BrokerBuy': [{u'AV': u'5.24',
                 u'BrokerNo': u'Optiver',
                 u'percent': u'10.09',
                 u'shares': u'43.06M',
                 u'turnover': u'225.67M'},
                {u'AV': u'5.26',
                 u'BrokerNo': u'UBS HK',
                 u'percent': u'4.81',
                 u'shares': u'20.47M',
                 u'turnover': u'107.63M'},
                {u'AV': u'5.22',
                 u'BrokerNo': u'\u4e2d\u94f6\u56fd\u9645',
                 u'percent': u'4.63',
                 u'shares': u'19.83M',
                 u'turnover': u'103.51M'},
                {u'AV': u'5.25',
                 u'BrokerNo': u'\u745e\u4fe1',
                 u'percent': u'3.88',
                 u'shares': u'16.54M',
                 u'turnover': u'86.82M'},
                {u'AV': u'5.24',
                 u'BrokerNo': u'IMC',
                 u'percent': u'3.84',
                 u'shares': u'16.38M',
                 u'turnover': u'85.89M'}],
 u'BrokerSell': [{u'AV': u'5.21',
                  u'BrokerNo': u'\u4e2d\u6295\u4fe1\u606f',
                  u'percent': u'8.90',
                  u'shares': u'38.19M',
                  u'turnover': u'199.12M'},
                 {u'AV': u'5.24',
                  u'BrokerNo': u'Optiver',
                  u'percent': u'5.51',
                  u'shares': u'23.55M',
                  u'turnover': u'123.29M'},
                 {u'AV': u'5.24',
                  u'BrokerNo': u'\u9ad8\u76db\u4e9a\u6d32',
                  u'percent': u'4.43',
                  u'shares': u'18.91M',
                  u'turnover': u'99.19M'},
                 {u'AV': u'5.28',
                  u'BrokerNo': u'JPMorgan',
                  u'percent': u'2.28',
                  u'shares': u'9.67M',
                  u'turnover': u'51.09M'},
                 {u'AV': u'5.25',
                  u'BrokerNo': u'IMC',
                  u'percent': u'0.88',
                  u'shares': u'3.76M',
                  u'turnover': u'19.70M'}],
 u'Buy': [{u'AV': u'5.24',
           u'BrokerNo': u'1499.Optiver',
           u'percent': u'10.09',
           u'shares': u'43.06M',
           u'turnover': u'225.67M'},
          {u'AV': u'5.24',
           u'BrokerNo': u'1453.IMC',
           u'percent': u'3.84',
           u'shares': u'16.38M',
           u'turnover': u'85.89M'},
          {u'AV': u'5.24',
           u'BrokerNo': u'7387.\u82b1\u65d7\u73af\u7403',
           u'percent': u'3.08',
           u'shares': u'13.16M',
           u'turnover': u'68.97M'},
          {u'AV': u'5.23',
           u'BrokerNo': u'6698.\u76c8\u900f\u8bc1\u5238',
           u'percent': u'1.74',
           u'shares': u'7.43M',
           u'turnover': u'38.86M'},
          {u'AV': u'5.21',
           u'BrokerNo': u'1799.\u8000\u624d\u8bc1\u5238',
           u'percent': u'1.44',
           u'shares': u'6.18M',
           u'turnover': u'32.16M'}],
 u'NetBuy': [{u'AV': u'5.25',
              u'BrokerNo': u'1499.Optiver',
              u'percent': u'4.58',
              u'shares': u'19.51M',
              u'turnover': u'102.37M'},
             {u'AV': u'5.24',
              u'BrokerNo': u'1453.IMC',
              u'percent': u'2.96',
              u'shares': u'12.62M',
              u'turnover': u'66.19M'},
             {u'AV': u'5.24',
              u'BrokerNo': u'7387.\u82b1\u65d7\u73af\u7403',
              u'percent': u'2.81',
              u'shares': u'11.98M',
              u'turnover': u'62.78M'},
             {u'AV': u'5.23',
              u'BrokerNo': u'6698.\u76c8\u900f\u8bc1\u5238',
              u'percent': u'1.66',
              u'shares': u'7.12M',
              u'turnover': u'37.24M'},
             {u'AV': u'5.26',
              u'BrokerNo': u'9065.UBS HK',
              u'percent': u'1.39',
              u'shares': u'5.91M',
              u'turnover': u'31.11M'}],
 u'NetNameBuy': [{u'AV': u'5.26',
                  u'BrokerNo': u'UBS HK',
                  u'percent': u'4.58',
                  u'shares': u'19.49M',
                  u'turnover': u'102.44M'},
                 {u'AV': u'5.25',
                  u'BrokerNo': u'Optiver',
                  u'percent': u'4.58',
                  u'shares': u'19.51M',
                  u'turnover': u'102.37M'},
                 {u'AV': u'5.22',
                  u'BrokerNo': u'\u4e2d\u94f6\u56fd\u9645',
                  u'percent': u'4.28',
                  u'shares': u'18.37M',
                  u'turnover': u'95.84M'},
                 {u'AV': u'5.24',
                  u'BrokerNo': u'\u745e\u4fe1',
                  u'percent': u'3.16',
                  u'shares': u'13.49M',
                  u'turnover': u'70.68M'},
                 {u'AV': u'5.24',
                  u'BrokerNo': u'IMC',
                  u'percent': u'2.96',
                  u'shares': u'12.62M',
                  u'turnover': u'66.19M'}],
 u'NetNameSell': [{u'AV': u'5.29',
                   u'BrokerNo': u'\u5174\u4e1a\u91d1\u878d',
                   u'percent': u'0.37',
                   u'shares': u'1.58M',
                   u'turnover': u'8.36M'},
                  {u'AV': u'5.25',
                   u'BrokerNo': u'\u4e2d\u56fd\u91d1\u878d',
                   u'percent': u'0.16',
                   u'shares': u'696K',
                   u'turnover': u'3.65M'},
                  {u'AV': u'5.32',
                   u'BrokerNo': u'\u94f6\u6cb3\u56fd\u9645',
                   u'percent': u'0.16',
                   u'shares': u'671K',
                   u'turnover': u'3.57M'},
                  {u'AV': u'5.29',
                   u'BrokerNo': u'Penjing',
                   u'percent': u'0.07',
                   u'shares': u'300K',
                   u'turnover': u'1.59M'},
                  {u'AV': u'5.31',
                   u'BrokerNo': u'\u5efa\u94f6\u56fd\u9645',
                   u'percent': u'0.06',
                   u'shares': u'272K',
                   u'turnover': u'1.44M'}],
 u'NetSell': [{u'AV': u'5.21',
               u'BrokerNo': u'6999.\u4e2d\u6295\u4fe1\u606f',
               u'percent': u'8.61',
               u'shares': u'36.93M',
               u'turnover': u'192.59M'},
              {u'AV': u'5.24',
               u'BrokerNo': u'3440.\u9ad8\u76db\u4e9a\u6d32',
               u'percent': u'4.03',
               u'shares': u'17.20M',
               u'turnover': u'90.15M'},
              {u'AV': u'5.30',
               u'BrokerNo': u'5337.JPMorgan',
               u'percent': u'0.67',
               u'shares': u'2.83M',
               u'turnover': u'15.00M'},
              {u'AV': u'5.29',
               u'BrokerNo': u'5980.\u5174\u4e1a\u91d1\u878d',
               u'percent': u'0.37',
               u'shares': u'1.58M',
               u'turnover': u'8.36M'},
              {u'AV': u'5.30',
               u'BrokerNo': u'8738.\u6c47\u4e30\u8bc1\u5238',
               u'percent': u'0.36',
               u'shares': u'1.53M',
               u'turnover': u'8.10M'}],
 u'Sell': [{u'AV': u'5.21',
            u'BrokerNo': u'6999.\u4e2d\u6295\u4fe1\u606f',
            u'percent': u'8.90',
            u'shares': u'38.19M',
            u'turnover': u'199.12M'},
           {u'AV': u'5.24',
            u'BrokerNo': u'1499.Optiver',
            u'percent': u'5.51',
            u'shares': u'23.55M',
            u'turnover': u'123.29M'},
           {u'AV': u'5.24',
            u'BrokerNo': u'3440.\u9ad8\u76db\u4e9a\u6d32',
            u'percent': u'4.19',
            u'shares': u'17.89M',
            u'turnover': u'93.75M'},
           {u'AV': u'5.25',
            u'BrokerNo': u'1453.IMC',
            u'percent': u'0.88',
            u'shares': u'3.76M',
            u'turnover': u'19.70M'},
           {u'AV': u'5.30',
            u'BrokerNo': u'5337.JPMorgan',
            u'percent': u'0.70',
            u'shares': u'2.96M',
            u'turnover': u'15.66M'}],
 u'Total': {u'In': u'1.26B',
            u'Net': u'5.800971E+08',
            u'Out': u'682.58M',
            u'right': u'98.71'}}

Which has all the table data, it is just a matter of using the keys to access what you need.

So in your loop, just pass each code:

for code in stock_code:
    params["Code"] = "E{}".format(code)
    js = requests.get("http://data.tsci.com.cn/RDS.aspx", params=params).json()

One thing to note, 0001 does not work here nor in your broswer, what does work is 00001.

Padraic Cunningham
  • 176,452
  • 29
  • 245
  • 321