To get the columns of the table also the first column containing link by clicking that link to get the data

Question

I have the below link

http://www.igrmaharashtra.gov.in/eASR/eASRCommon.aspx?hDistName=Pune

In this i want to scrape data in proper format in excel.The SurveyNo link contains the data when it is click i want the row-wise data with the data on clicking the survey number.

Also want the format that i have attached in the image (desired output in excel)

import urllib.request
from bs4 import BeautifulSoup
import csv
import os
from selenium import webdriver
from selenium.webdriver.support.select import Select
from selenium.webdriver.common.keys import Keys
import time
url = 'http://www.igrmaharashtra.gov.in/eASR/eASRCommon.aspx? 
hDistName=Pune'
chrome_path =r'C:/Users/User/AppData/Local/Programs/Python/Python36/Scripts/chromedriver.exe'
driver = webdriver.Chrome(executable_path=chrome_path)
driver.implicitly_wait(10)
driver.get(url)
Select(driver.find_element_by_name('ctl00$ContentPlaceHolder5$ddlTaluka')).select_by_value('5')
Select(driver.find_element_by_name('ctl00$ContentPlaceHolder5$ddlVillage')).select_by_value('1872')
soup=BeautifulSoup(driver.page_source, 'lxml')
table = soup.find("table" , attrs = {'id':'ctl00_ContentPlaceHolder5_grdUrbanSubZoneWiseRate' })
with open('Baner.csv', 'w',encoding='utf-16',newline='') as csvfile:
     f = csv.writer(csvfile, dialect='excel')
     f.writerow(['SurveyNo','Subdivision', 'Open ground', 'Resident house','Offices','Shops','Industrial','Unit (Rs./)'])  # headers
     rows = table.find_all('tr')[1:] 
     data=[]
     for tr in rows:  
         cols = tr.find_all('td')
         for td in cols:
              links = driver.find_elements_by_link_text('SurveyNo')
              l =len(links)
              data12 =[]
              for i in range(l):
                   newlinks = driver.find_elements_by_link_text('SurveyNo')
                   newlinks[i].click()
                   soup = BeautifulSoup(driver.page_source, 'lxml')
                   td1 = soup.find("textarea", attrs={'class': 'textbox'})
                   data12.append(td1.text)
                   data.append(td.text)
                   data.append(data12)
              print(data)

Please find the image. In that format I required the output of scrape data.

Your url isn't enough; it does get you to Pune, but it's asking also for a Taluka and Village. — Jack Fleeting, May 03 '19 at 10:42
I'm not getting anything like your image. Can you provide the Devanagari form of Baner? Maybe that will help. — Jack Fleeting, May 03 '19 at 10:51
image contains the required output on excel . From the website i need to scrape the data and export it in excel. — A.D, May 03 '19 at 11:35

QHarr · Accepted Answer · 2019-05-16T12:42:45.910

0

You could do the following and simply re-arrange columns at end along with desired renaming. There is the assumption SurveyNo exists for all wanted rows. I extract the hrefs from the SurveyNo cells which are actually executable strings you can pass to execute_script to show the survey numbers without worrying about stale element etc....

from selenium import webdriver
import pandas as pd

url = 'http://www.igrmaharashtra.gov.in/eASR/eASRCommon.aspx?hDistName=Pune'
d = webdriver.Chrome()
d.get(url)
d.find_element_by_css_selector('[value="5"]').click()
d.find_element_by_css_selector('[value="1872"]').click()
tableElement = d.find_element_by_id('ctl00_ContentPlaceHolder5_grdUrbanSubZoneWiseRate')
table = pd.read_html(tableElement.get_attribute('outerHTML'))[0]
table.columns = table.iloc[0]
table = table.iloc[1:]
table = table[table.Select == 'SurveyNo'] #assumption SurveyNo exists for all wanted rows
surveyNo_scripts = [item.get_attribute('href') for item in d.find_elements_by_css_selector("#ctl00_ContentPlaceHolder5_grdUrbanSubZoneWiseRate [href*='Select$']")]
i = 0
for script in surveyNo_scripts:
    d.execute_script(script)
    surveys = d.find_element_by_css_selector('textarea').text
    table.iloc[i]['Select'] = surveys
    i+=1   
print(table)
#rename and re-order columns as required
table.to_csv(r"C:\Users\User\Desktop\Data.csv", sep=',', encoding='utf-8-sig',index = False )

Output before rename and re-order:

In a loop you can concat all dfs and then write out in one go (my preference - shown here) or later append as shown here

edited May 16 '19 at 12:42

answered May 03 '19 at 13:29

QHarr

83,427
12
54
101

Please explain not working as it worked when I ran it. You have noticed that I am not using your path to selenium for example? chrome_path =r'C:/Users/User/AppData/Local/Programs/Python/Python36/Scripts/chromedriver.exe' driver = webdriver.Chrome(executable_path=chrome_path) I also didn't need waits but those can be easily added. – QHarr May 03 '19 at 16:19
AttributeError: 'DataFrame' object has no attribute 'Select' showing above error – A.D May 03 '19 at 16:25
I am guessing it is appearing differently for you. If you do the code up to and including the line: table = table.iloc[1:] and then print table. What is the name of the column that has the value 'SurveyNo' in it? – QHarr May 03 '19 at 16:27
See my comment above – QHarr May 03 '19 at 16:31
Actually the (table) output is in dataframe when I am going to add the next page data it is getting overwrite on the CSV file – A.D May 09 '19 at 10:08
When using the same code for url = http://www.igrmaharashtra.gov.in/eASR/eASRCommon.aspx?hDistName=Buldhana the SurveyNo textarea data have not been added in the Select column. – A.D May 14 '19 at 09:27
For the url = http://www.igrmaharashtra.gov.in/eASR/eASRCommon.aspx?hDistName=Buldhana I am selecting the option value = 1 for Talukas and option value = 1458 for village. – A.D May 14 '19 at 09:30
Yes i debug the program .From the code the surveys are printed properly but the issue is surveys are not able to fed table.iloc[i]['Select'] . I thing the issue is for this one --- table.iloc[i]['Select'] = surveys – A.D May 14 '19 at 09:32
I will have a look this evening – QHarr May 14 '19 at 09:32
Yet not found any solution. Please assist – A.D May 15 '19 at 03:00
What am I selecting for taluka and village? – QHarr May 16 '19 at 05:35
option value = 1 for Talukas and option value = 1458 for village – A.D May 16 '19 at 05:41
Used the url = igrmaharashtra.gov.in/eASR/eASRCommon.aspx?hDistName=Buldhana – A.D May 17 '19 at 06:08
for the above url you will get the taluka option value = 1 and village option value = 1458 – A.D May 17 '19 at 06:08
I’m in hospital and then at work. I will see what I can do over the weekend – QHarr May 17 '19 at 10:19
Above code is working only for Pune District it is not working for other district talukas and villages – A.D May 20 '19 at 10:46

To get the columns of the table also the first column containing link by clicking that link to get the data

1 Answers1