0

I am trying to webscrape property information using a county website First what I would like to webscrape: URL: http://reparcelasmt.loudoun.gov/search/commonsearch.aspx?mode=parid For example: Enter in the tax Parcel ID # of "123205197000" Click SEARCH Click the row to view the property detail Select the tab for Residential Scrape the table under "Primary Building"

Second what I have done so far: I've made some progress using Scraping from aspx website

require(httr)
require(XML)
basePage <- "http://reparcelasmt.loudoun.gov"
h <- handle(basePage)
GET(handle = h)
res <- GET(handle = h, path = "/search/commonsearch.aspx?mode=parid")
resXML <- htmlParse(content(res, as = "text"))

in viewing resXML, I've found the below html code that I think may need to be what needs to be filled in and somehow submitted. Keywords in the HTML that I think might be relevant are: id="Form1" and id="action"

<form name="Form1" method="post" action="Disclaimer.aspx?FromUrl=..%2fsearch%2fcommonsearch.aspx%3fmode%3dparid" id="Form1">

<input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" value="/wEPDwUKLTg0Mjk5NDk3MWRkj8q93u53cL62jCmCkDzR+iRJJ70=">
<input type="hidden" name="__EVENTVALIDATION" id="__EVENTVALIDATION" value="/wEWBQL8q9ymDgLpuJU7Aub60+ELAuO8lrkBAtL2kugI8BSyTTneHZXvLUVQf7YJFvW03XQ=">
<table cellpadding="1" width="430" align="center">
<tr>
<td align="center">
<input onclick="__doPostBack('btDisagree','')" name="btDisagree" type="button" id="btDisagree" class="MenuButton" style="WIDTH: 100px" value="Disagree">
</td><td align="center">
<input onclick="__doPostBack('btAgree','')" name="btAgree" type="button" id="btAgree" class="MenuButton" style="WIDTH: 100px" value="Agree">
</td></tr></table>
<input name="hdURL" type="hidden" id="hdURL" value="../search/commonsearch.aspx?mode=parid">
<input name="action" type="hidden" id="action">
</form>

If anyone has any ideas on how to proceed that would be fantastic.

Thank You. Matt

Community
  • 1
  • 1
STATMATT
  • 17
  • 4

1 Answers1

0

I was able to extract the text related to the ID 123205197000 with the following code :

library(RSelenium)
library(pagedown)
shell('docker run -d -p 4445:4444 selenium/standalone-firefox')
remDr <- remoteDriver(remoteServerAddr = "localhost", port = 4445L, browserName = "firefox")
remDr$open()
remDr$navigate("https://reparcelasmt.loudoun.gov/pt/search/CommonSearch.aspx?mode=PARID")
remDr$screenshot(display = TRUE, useViewer = TRUE) 

web_Obj_InpParid <- remDr$findElement("id", "inpParid")
web_Obj_InpParid$sendKeysToElement(list("123205197000"))

web_Obj_btSearch <- remDr$findElement("id", "btSearch")
web_Obj_btSearch$clickElement()

web_Obj_Table <- remDr$findElement("xpath", "//*[@id='frmMain']/div[3]/div")
text <- web_Obj_Table$getElementText()[[1]]
text <- strsplit(text, "\n")
Emmanuel Hamel
  • 1,769
  • 7
  • 19