0

I used python 3 and beautiful soup 4 to parse the webpage from Hong Kong stock exchange. However, the table (ie: No. of listed companies...No. of listed H shares...) under "HONG KONG AND MAINLAND MARKET HIGHLIGHTS" cannot be extracted. Here is the link: "https://www.hkex.com.hk/Mutual-Market/Stock-Connect/Statistics/Hong-Kong-and-Mainland-Market-Highlights?sc_lang=en#select3=0&select2=10&select1=0" Kindly advice.

My code:

import requests
from bs4 import BeautifulSoup
import csv
import sys
import os

result = requests.get("https://www.hkex.com.hk/Mutual-Market/Stock-Connect/Statistics/Hong-Kong-and-Mainland-Market-Highlights?sc_lang=en#select3=0&select2=10&select1=3")

result.raise_for_status()
result.encoding = "utf-8"


src = result.content
soup = BeautifulSoup(src, 'lxml')
print(soup.prettify())


print(" ")
print("soup.pretty() printed")
print(" ")
wait = input("PRESS ENTER TO CONTINUE.")

table = soup.find_all('table')
print(table)

print(" ")
print("TABLE printed")
print(" ")
wait2 = input("PRESS ENTER TO CONTINUE.")
Arthur Law
  • 111
  • 7
  • 1
    You need to render the page first (as it uses some JavaScript) in order to view your table. Possible duplicate of [Web-scraping JavaScript page with Python](https://stackoverflow.com/questions/8049520) – Janez Kuhar Nov 05 '19 at 02:31
  • Does this answer your question? [Web-scraping JavaScript page with Python](https://stackoverflow.com/questions/8049520/web-scraping-javascript-page-with-python) – Janez Kuhar Nov 05 '19 at 02:44

1 Answers1

0

No need to render the page first, as you can get the data back in the json format. The tricky part is the json format is how to render the table (with the td tags and colspan tags, etc.). So there has to be a little work to be done to iterate through that, but not impossible to do:

import requests
import pandas as pd

url = 'https://www.hkex.com.hk/eng/csm/ws/Highlightsearch.asmx/GetData'

payload = {
'LangCode': 'en',
'TDD': '1',
'TMM': '11',
'TYYYY': '2019'}

jsonData = requests.get(url, params=payload).json()

final_df = pd.DataFrame()
for row in jsonData['data']:
    #row = jsonData['data'][1]

    data_row = []
    for idx, colspan in enumerate(row['colspan']):
        colspan_int = int(colspan[0])
        data_row.append(row['td'][idx] * colspan_int)
        flat_list = [item for sublist in data_row for item in sublist]
    temp_row = pd.DataFrame([flat_list])
    final_df = final_df.append(temp_row, sort=True).reset_index(drop=True)


df = final_df[final_df[0].str.contains(r'Total market 
capitalisation(?!$)')].iloc[:,:2]
df['date'] = date
df.to_csv('file.csv', index=False)

Output:

print (final_df.to_string())
                                                    0                                      1                                      2                                           3                                           4                                           5                                           6
0                                                      Hong Kong <br>Exchange (01/11/2019  )  Hong Kong <br>Exchange (01/11/2019  )  Shanghai  Stock<br>Exchange (01/11/2019  )  Shanghai  Stock<br>Exchange (01/11/2019  )  Shenzhen  Stock<br>Exchange (01/11/2019  )  Shenzhen  Stock<br>Exchange (01/11/2019  )
1                                                                                 Main Board                                    GEM                                     A Share                                     B Share                                     A Share                                     B Share
2                             No. of listed companies                                  2,031                                    383                                       1,488                                          50                                       2,178                                          47
3                              No. of listed H shares                                    256                                     22                                        n.a.                                        n.a.                                        n.a.                                        n.a.
4                      No. of listed red-chips stocks                                    170                                      5                                        n.a.                                        n.a.                                        n.a.                                        n.a.
5                      Total no. of listed securities                                 12,573                                    384                                        n.a.                                        n.a.                                        n.a.                                        n.a.
6       Total market capitalisation<br>(Bil. dollars)                             HKD 31,956                                HKD 109                                  RMB 32,945                                      RMB 81                                  RMB 22,237                                      RMB 50
7   Total negotiable <br>capitalisation (Bil. doll...                                   n.a.                                   n.a.                                  RMB 28,756                                      RMB 81                                  RMB 16,938                                      RMB 49
8                           Average P/E ratio (Times)                                  11.16                                  19.76                                       13.90                                        9.18                                       24.70                                        9.55
9                    Total turnover <br>(Mil. shares)                                196,082                                    560                                      15,881                                          15                                      22,655                                          14
10                  Total turnover <br>(Mil. dollars)                             HKD 79,397                                HKD 160                                 RMB 169,934                                      RMB 85                                 RMB 260,208                                      RMB 57
11            Total market turnover<br>(Mil. dollars)                             HKD 79,557                             HKD 79,557                                 RMB 176,232                                 RMB 176,232                                 RMB 260,264                                 RMB 260,264
chitown88
  • 27,527
  • 4
  • 30
  • 59
  • Thank you very much for your answer. If I want to extract " Total market capitalisation(Bil. dollars) HKD 31,956" into a CSV file with format 5 NOV, 32650; 4 NOV 32475 & etc , how to do? – Arthur Law Nov 06 '19 at 00:20
  • SImply put: How to extract the Date and Total market capitalisation HKD 31956 or a specific data into a single row in a CSV file? – Arthur Law Nov 06 '19 at 00:32
  • Thank you very much. How to download pull down list selections: "Announcements and Notices > Reorganisation/Change in Shareholding/Major Changes/Public Float/Listing Status > Announcement by Offeree Company under the Takeovers Code" from "https://www1.hkexnews.hk/search/titlesearch.xhtml"? I tried to modify the code you given, but cannot work. Kindly advice. – Arthur Law Nov 10 '19 at 06:26
  • I’ll take a look in a bit. Hopefully it’s just simply a parameter in the payload query that needs to be added. But we’ll see – chitown88 Nov 10 '19 at 09:12
  • I am trying to apply the code to other sections of the web site. How you find the correct link for data extract," https://www.hkex.com.hk/eng/csm/ws/Highlightsearch.asmx/GetData", from the above web page? Thanks. – Arthur Law Nov 15 '19 at 00:37