0

Trying to get the contents of body in table id = mytable by putting value in registration no. But failed to get.

Tried using headers like user agents and beautifulsoup network tab form data, but failed to get.

from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver import ActionChains

url="https://rof.mahaonline.gov.in/Search/Search"

driver = webdriver.Chrome(r'C:\chromedriver.exe')
driver.get(url)

driver.find_element_by_xpath("""//*[@id="registrationnumber"]""").send_keys("MU000000001")
driver.find_element_by_xpath("""//*[@id="btnSearch"]""").click()

soup = BeautifulSoup(driver.page_source, 'html.parser')

table = soup.find('table',{'id':'myTable'})
body = table.find('tbody')
print(body)

driver.close()

Please help me to get through this help , it will be very good if this solve with beautifulsoup form data, thanks in advance.

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352

2 Answers2

0

To get the table body contents you have to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following Locator Strategies:

  • Using CSS_SELECTOR:

    driver.get("https://rof.mahaonline.gov.in/Search/Search")
    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "input#registrationnumber"))).send_keys("MU000000001")
    driver.find_element_by_css_selector("button#btnSearch").click()
    WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "table#myTable tbody>tr td")))
    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "table#myTable tbody>tr"))).get_attribute("outerHTML"))
    
  • Using XPATH:

    driver.get("https://rof.mahaonline.gov.in/Search/Search")
    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//input[@id='registrationnumber']"))).send_keys("MU000000001")
    driver.find_element_by_xpath("//button[@id='btnSearch']").click()
    WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//table[@id='myTable']//tbody/tr//td")))
    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//table[@id='myTable']//tbody/tr"))).get_attribute("outerHTML"))
    
  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    
  • Console Output:

    <tr role="row" class="odd"><td>1</td><td>MU000000001</td><td>13 June      2012</td><td>CLASSIC DEVELOPERS.</td><td>PURCHASE  AND SALE OF LANDS, PLOTS, BUILDINGS AND ALL TYPE OF CIVIL WORK WITH OR WITHOUT MATERIAL, CONTRACTORS, BUILDERS, DEVELOPERS AND REDEVELOPERS OF RESIDENTIAL PREMIES, COMMERCIAL PREMISES, SHOPPING MALLS, INDUSTRIAL SHEDS ETC.</td><td>House/Building No.BELIRAM INDUSTRIAL ESTATE.,House/Building Name:25,,<br>StreetName:S.V.ROAD,,<br>Village/Town/City:DAHISAR (EAST),<br>Taluka:Mumbai(Suburban)<br>District:Mumbai Suburban,State:Maharashtra<br>Pincode:400068<br></td></tr>
    
undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
0

Provide some sleep time to load the page and then take the page_source.

import time
from bs4 import BeautifulSoup
from selenium import webdriver

url="https://rof.mahaonline.gov.in/Search/Search"

driver = webdriver.Chrome(r'C:\chromedriver.exe')
driver.get(url)

driver.find_element_by_xpath("""//*[@id="registrationnumber"]""").send_keys("MU000000001")
driver.find_element_by_xpath("""//*[@id="btnSearch"]""").click()
time.sleep(3)
soup = BeautifulSoup(driver.page_source, 'html.parser')

table = soup.find('table',{'id':'myTable'})
body = table.find('tbody')
for row in body.find_all('tr'):
  tds=[td.text for td in row.find_all('td')]
  print(tds)

Output:

['1', 'MU000000001', '13 June      2012', 'CLASSIC DEVELOPERS.', 'PURCHASE  AND SALE OF LANDS, PLOTS, BUILDINGS AND ALL TYPE OF CIVIL WORK WITH OR WITHOUT MATERIAL, CONTRACTORS, BUILDERS, DEVELOPERS AND REDEVELOPERS OF RESIDENTIAL PREMIES, COMMERCIAL PREMISES, SHOPPING MALLS, INDUSTRIAL SHEDS ETC.', 'House/Building No.BELIRAM INDUSTRIAL ESTATE.,House/Building Name:25,,StreetName:S.V.ROAD,,Village/Town/City:DAHISAR (EAST),Taluka:Mumbai(Suburban)District:Mumbai Suburban,State:MaharashtraPincode:400068']

Or you can use pandas library using read_html() to get data from table.

import time
from selenium import webdriver
import pandas as pd

url="https://rof.mahaonline.gov.in/Search/Search"
driver = webdriver.Chrome(r'C:\chromedriver.exe')
driver.get(url)

driver.find_element_by_xpath("""//*[@id="registrationnumber"]""").send_keys("MU000000001")
driver.find_element_by_xpath("""//*[@id="btnSearch"]""").click()
time.sleep(3)
table=pd.read_html(driver.page_source)
print(table[0])
KunduK
  • 32,888
  • 5
  • 17
  • 41
  • thank you so much, can you please solve this using only beautifulsoup that will save my time –  Sep 11 '19 at 13:29
  • i tried using beautifulsoup, but did not get submit button in network tab from data –  Sep 11 '19 at 13:30
  • Well I’ll check later if it all possible by beautiful soup only. – KunduK Sep 11 '19 at 14:45