Extract data from BSE website

Question

How can I extract the value of Security ID, Security Code, Group / Index, Wtd.Avg Price, Trade Date, Quantity Traded, % of Deliverable Quantity to Traded Quantity using Python 3 and save it to an XLS file. Below is the link.

https://www.bseindia.com/stock-share-price/smartlink-network-systems-ltd/smartlink/532419/

PS: I am completely new to the python. I know there are few libs which make scrapping easier like BeautifulSoup, selenium, requests, lxml etc. Don't have much idea about them.

Edit 1: I tried something

from bs4 import BeautifulSoup
import requests
URL = 'https://www.bseindia.com/stock-share-price/smartlink-network-systems-ltd/smartlink/532419/'
r = requests.get(URL)
soup = BeautifulSoup(r.content, 'html5lib')
table = soup.find('div', attrs = {'id':'newheaddivgrey'})
print(table)

Its output is None. I was expecting all tables in the webpage and filter them further to get required data.

import requests
import lxml.html
URL = 'https://www.bseindia.com/stock-share-price/smartlink-network-systems-ltd/smartlink/532419/'
r = requests.get(URL)
root = lxml.html.fromstring(r.content)
title = root.xpath('//*[@id="SecuritywiseDeliveryPosition"]/table/tbody/tr/td/table/tbody/tr[1]/td')
print(title)

Tried another code. Same problem.

Edit 2: Tried selenium. But I am not getting the table contents.

from selenium import webdriver
driver = webdriver.Chrome(r"C:\Program Files\JetBrains\PyCharm Community Edition 2017.3.3\bin\chromedriver.exe")
driver.get('https://www.bseindia.com/stock-share-price/smartlink-network-systems-ltd/smartlink/532419/')
table=driver.find_elements_by_xpath('//*[@id="SecuritywiseDeliveryPosition"]/table/tbody/tr/td/table/tbody/tr[1]/td')
 print(table)
driver.quit()

Output is [<selenium.webdriver.remote.webelement.WebElement (session="befdd4f01e6152942c9cfc7c563a6bf2", element="0.13124528538297953-1")>]

https://stackoverflow.com/questions/2081586/web-scraping-with-python — Keyur Potdar, Mar 07 '18 at 12:51
Welcome to Stack Overflow! Please [edit] your question to show [the code you have so far](http://whathaveyoutried.com). You should include at least an outline (but preferably a [mcve]) of the code that you are having problems with, then we can try to help with the specific problem. You should also read [ask]. — Toby Speight, Mar 07 '18 at 12:53
the `requests` library is best suited to individual HTTP requests. It will not download the whole web page, only the HTML file at the requested location; this means that any other content, such as that loaded by Javascript, will not be downloaded. When opening the URL provided, a loading message appears momentarily, suggesting that the actual data is indeed loaded by JS. Consider looking through the site's JS and reverse-engineering it to directly fetch the data you need, rather than scraping it from a rendered page. — speedstyle, Mar 07 '18 at 17:55
Can you please help me with the small piece of code that scraps javascript data? Please use the link given in my post. — yadav, Mar 07 '18 at 18:06
The easiest way to handle this page is to use [Selenium](https://pypi.python.org/pypi/selenium). Each of the different tables on this page is loaded dynamically through different AJAX requests. So, if you want to use `requests` to scrape this, you'll have to get the data from multiple urls (which will become a bit tedious for someone who is new to scraping). For e.g., the table containing *Security ID*, *Security Code*, etc is loaded from this url - https://www.bseindia.com/SiteCache/1D/CompanyHeader.aspx?Type=EQ&text=532419 — Keyur Potdar, Mar 08 '18 at 04:41
How did you get the below link. https://www.bseindia.com/SiteCache/1D/CompanyHeader.aspx?Type=EQ&text=532419 . I have tried using selenium but not getting the table contents. Included code in the original post. — yadav, Mar 08 '18 at 14:35
@KeyurPotdar. Can you please look in to my code using selenium. — yadav, Mar 09 '18 at 17:12
Have a look at this question: https://stackoverflow.com/questions/7263824/get-html-source-of-webelement-in-selenium-webdriver-using-python — Keyur Potdar, Mar 09 '18 at 17:46
I checked the link. It is for the older version of selenium. I am using selenium 3.10 — yadav, Mar 10 '18 at 08:23

score 1 · Accepted Answer · answered Mar 10 '18 at 13:23

After loading the page with Selenium, you can get the Javascript modified page source using driver.page_source. You can then pass this page source in the BeautifulSoup object.

driver = webdriver.Chrome()
driver.get('https://www.bseindia.com/stock-share-price/smartlink-network-systems-ltd/smartlink/532419/')
html = driver.page_source
driver.quit()

soup = BeautifulSoup(html, 'lxml')
table = soup.find('div', id='SecuritywiseDeliveryPosition')

This code will give you the Securitywise Delivery Position table in the table variable. You can then parse this BeautifulSoup object to get the different values you want.

The soup object contains the full page source including the elements that were dynamically added. Now, you can parse this to get all the things you mentioned.

Extract data from BSE website

1 Answers1