0

I have a situation where I am not able to parse data from following link for example:

https://www.bseindia.com/stock-share-price/avanti-feeds-ltd/avanti/512573/

From this webpage I want to populate High Lows table. I tried many combinations of table and div but in vain. Below is my python beautifulsoup code (BS4)

import csv
import requests
import urllib.request
import urllib.parse
from bs4 import BeautifulSoup

f = open('bse.csv', 'w', newline = '')
writer = csv.writer(f)

with open("bselist.csv") as f:

    for row in csv.reader(f):

        for stock in row:

            url = "https://www.bseindia.com/stock-share-price/{}".format(stock)    
            soup = BeautifulSoup(urllib.request.urlopen(url).read(), "lxml")    
            mydivs = soup('div', {"class": "newscripcotent5"})[0].find_all('span')    
            writer.writerow([stock] + mydivs)
            print([stock] + mydivs)

URL for simplicity I have given direct link to one of the records contained in the file bselist.csv. I am looking for div id "highlow"

It just gives me following output

avanti-feeds-ltd/avanti/512573/

without any table I am looking for.

Output should be ideally somewhat like following:

avanti-feeds-ltd/avanti/512573/ 52 Week High (adjusted) 999.00(13/11/2017)
avanti-feeds-ltd/avanti/512573/ 52 Week Low (adjusted)  410.26(05/06/2018)
avanti-feeds-ltd/avanti/512573/ 52 Week High (Unadjusted)   3,000.00(13/11/2017)
avanti-feeds-ltd/avanti/512573/ 52 Week Low (Unadjusted)    535.50(29/06/2018)
avanti-feeds-ltd/avanti/512573/ Month H/L   659.34/410.26
avanti-feeds-ltd/avanti/512573/ Week H/L    625.25/508.82
CDspace
  • 2,639
  • 18
  • 30
  • 36
Mandar
  • 11
  • 3

1 Answers1

0

The info you are trying to fetch seems to be populated dynamically using javascript and that's probably why you are not able to fetch it. So in order to get around this you can use selenium webdriver to fetch the dynamic content.

This is how the code looks:

import csv
from bs4 import BeautifulSoup
from selenium import webdriver

output_file = open('bse.csv', 'w')

with open("bselist.csv") as f:
    for row in csv.reader(f):
        for stock in row:
            url = "https://www.bseindia.com/stock-share-price/{}".format(stock)
            driver = webdriver.Chrome('/path/to/chromedriver')
            driver.get(url)
            html = driver.page_source
            soup = BeautifulSoup(html, "html.parser")
            div = soup.find_all('div', {"class": "newscripcotent5"})[0]
            outer_table = div.find_all('table')[0]
            inner_table = outer_table.findChildren("table")[0]
            rows = inner_table.findChildren("tr")
            for row in rows:
                cols = row.findChildren("td")
                if len(cols) < 2:
                    continue
                output_file.write(stock + "," + cols[0].getText() + "," + cols[1].getText() + "\n")
                print(stock + " " + cols[0].getText() + " " + cols[1].getText())

f.close()

Be sure to replace /path/to/chromedriver with the appropriate path to chromedriver.

So assuming your bselist.csv contains:

avanti-feeds-ltd/avanti/512573/

You'll get the following output:

avanti-feeds-ltd/avanti/512573/ 52 Week High (adjusted) 999.00(13/11/2017)
avanti-feeds-ltd/avanti/512573/ 52 Week Low (adjusted) 410.26(05/06/2018)
avanti-feeds-ltd/avanti/512573/ 52 Week High (Unadjusted) 3,000.00(13/11/2017)
avanti-feeds-ltd/avanti/512573/ 52 Week Low (Unadjusted) 507.00(02/07/2018)
avanti-feeds-ltd/avanti/512573/ Month H/L 659.34/410.26
avanti-feeds-ltd/avanti/512573/ Week H/L 615.00/507.00

If you don't already have selenium and chromedriver, you'll need to install it first. I installed these like this on my mac os:

sudo easy_install selenium
sudo easy_install chromedriver

You might find the following posts helpful:

pgngp
  • 1,552
  • 5
  • 16
  • 26
  • hi friend i m getting error ModuleNotFoundError: No module named 'selenium'. Spent more than an hour fixing it but in vain. i m using python 3.6.4 selenium==3.13.0 pip freeze command also working fine. dont know how to resolve. I am on windows 10 – Mandar Jul 04 '18 at 16:54
  • Update: Thanks a ton. I just copy pasted the code inside cmd window and its working like a charm except few issues. All the chrome windows remain open i will have to close them manually. Also csv is not getting written though i can see output on screen. Also i get errors like Unable to read VR Path Registry. – Mandar Jul 04 '18 at 19:15
  • Any other solution which does not involve invoking a chrome browser window everytime ?? Normal bs4 cant handle this ?? like the code i have written ? – Mandar Jul 04 '18 at 19:17
  • There might be some way to not open chrome windows, but I don't know off the top of my head. As far as I know, normal bs4 won't be able to extract dynamic content from the webpage. There might be other modules besides `selenium` that may be able to achieve the same thing, but I'm not aware of them. Some google search would help. – pgngp Jul 05 '18 at 06:39