Cannot extract the html table

Question

I want to harvest information using beautiful soup and python3 from a table within a given website .

I have also tried to use XPath method but still cannot get a way to obtain the data.

coaches = 'https://www.badmintonengland.co.uk/coach/find-a-coach'
coachespage = urlopen(coaches)
soup = BeautifulSoup(coachespage,features="html.parser")
data = soup.find_all("tbody", { "id" : "JGrid-az-com-1031-tbody" })

def crawler(table):
    for mytable in table:  
        try:
            rows = mytable.find_all('tr')
            for tr in rows:
                cols = tr.find_all('td')
                for td in cols:
                    return(td.text)
        except:
            raise ValueError("no data")


print(crawler(data))

I went to website I found that we have enter "postalcode" and "distance" in order to see the table. Can you please tell me where are you writing that code? — Pallamolla Sai, Apr 07 '19 at 15:05
I filter the distance with the option any, and the table list is shown, hopefully, I am getting your question. — Nathan Kirui, Apr 07 '19 at 15:11
beautiful soup can't get the update html which was set by JS. I think it's better to use selenium. Are you using it? — Pallamolla Sai, Apr 07 '19 at 15:42
this link might help. https://stackoverflow.com/questions/8049520/web-scraping-javascript-page-with-python — Pallamolla Sai, Apr 07 '19 at 15:42

score 1 · Accepted Answer · answered Apr 07 '19 at 16:12

If you use selenium to make the selections and then pd.read_html the page_source to get the table, this allows javascript to run and populate values

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import pandas as pd
import time

url = 'https://www.badmintonengland.co.uk/coach/find-a-coach'
driver = webdriver.Chrome()
driver.get(url)
ele = driver.find_element_by_css_selector('.az-triggers-panel a') #distance dropdown
driver.execute_script("arguments[0].scrollIntoView();", ele)
ele.click()
option = WebDriverWait(driver,10).until(EC.presence_of_element_located((By.ID, "comboOption-az-com-1015-8"))) # any distance
option.click()
driver.find_element_by_css_selector('.az-btn-text').click()

time.sleep(5) #seek better wait condition for page update
tables  = pd.read_html(driver.page_source)

This seems awesome, but I decided to use B4 instead of pandas, immediately after driver finished loading, but then again from the table itself there is another js pop that allows the view of the email, am I able to call the driver again? — Nathan Kirui, Apr 08 '19 at 16:14

Cannot extract the html table

1 Answers1