I have spent about 15 days trying to make a scraper in VBA, I've been making decent progresses day after day but these last two days I got stuck in the very last step to get the data.
This is a continuation of my previous post, which gave me a good guide to start.
Here's the process I want to simulate usign MSXML (not Internet Explorer)
- Open https://beacon.schneidercorp.com/
- Select "Iowa State"
- Select "Boone County, IA"
- Click on the popup link "Property Search"
- In the top red ribbon, click on the "Comp Search" label
- At the bottom of the resulting page, in the "Agricultural Comparables Search" section check the "Sale Date" checkbox
- Select 5 months in the "Sale Date" combobox
- Click on the "Search" button at the bottom of the "Agricultural Comparables Search" section
- In the resulting page, look for "Parcel ID" identified as "088327354400002" and click on the link on the "Recording" column (value "2020-0418")
I could achieve the first 8 steps but I haven't been able to get URL of the results that should be get from that last link held in "2020-0418"
As I did to get from the 8th to the 9th step, I noticed that inside the Development ToolKit's "Network" Tab, the website sent a POST request, as shown below.
**General**
Request URL: https://beacon.schneidercorp.com/Application.aspx?AppID=84&LayerID=795&PageTypeID=3&PageID=579&Q=1926372975
Request Method: POST
Status Code: 302
Remote Address: 52.168.93.150:443
Referrer Policy: no-referrer-when-downgrade
**Response Headers**
alt-svc: quic=":443"; ma=2592000; v="44,43,39"
cache-control: private
content-encoding: gzip
content-length: 187
content-type: text/html; charset=utf-8
date: Sat, 27 Jun 2020 00:46:42 GMT
location: /Application.aspx?AppID=84&LayerID=795&PageTypeID=3&PageID=551&Q=1603287013
status: 302
vary: Accept-Encoding
**Request Headers**
:authority: beacon.schneidercorp.com
:method: POST
:path: /Application.aspx?AppID=84&LayerID=795&PageTypeID=3&PageID=579&Q=1926372975
:scheme: https
accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed- exchange;v=b3;q=0.9
accept-encoding: gzip, deflate, br
accept-language: es-ES,es;q=0.9
cache-control: max-age=0
content-length: 395
content-type: application/x-www-form-urlencoded
cookie: _ga=GA1.2.1299682399.1590279064; MODULES508=; MODULESVISIBILE508=18469; MODULES1024=; MODULESVISIBILE1024=29489%7C29501; MODULES501=; MODULESVISIBILE501=10310; _gid=GA1.2.449363625.1593013300; ASP.NET_SessionId=4xwgdh2cqto0kugirkani4vp; _gat=1
origin: https://beacon.schneidercorp.com
referer: https://beacon.schneidercorp.com/Application.aspx?AppID=84&LayerID=795&PageTypeID=3&PageID=579&Q=1926372975
sec-fetch-dest: document
sec-fetch-mode: navigate
sec-fetch-site: same-origin
sec-fetch-user: ?1
upgrade-insecure-requests: 1
user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36
Query String Parameters (source view)
AppID=84&LayerID=795&PageTypeID=3&PageID=579&Q=1926372975
**Form Data (source view)**
__EVENTTARGET=ctlBodyPane%24ctl02%24ctl01%24gvwAgCompResults%24ctl05%24lnkRecording&__EVENTARGUMENT=&__VIEWSTATE=cbg8zdrx99ofbjcpw9%2FCE8J0v2SY5W86N%2Fbx%2FU0CsnNPy9D3bcg%2F5YstkCGTwd03lObnZbF9%2B5QuO1lP658HYgyXsOmpImGVjhn47teNdO788MngiEN9qzZbzrOv8jZAd93B8QXltxoPV5dLVu0%2BELpETwwTteNsmbKNEr1IpBz2aSxsN1spJUTKy42SUE37HkdUqVpsQlCPHPyIomJH4b6CoepL2uG9y45pMbUYFZxPG5ob&__VIEWSTATEGENERATOR=569DB96F
Next, I show a sample of my code
Sub ScrapingTest()
Dim XMLpagina As New MSXML2.ServerXMLHTTP60
Dim htmlDoc As New MSHTML.htmlDocument
Dim strURL As String, strBodyRequest As String
Dim strETarget As String, strVState As String
Dim strT1 As String, strT2 As String, strT3 As String
Dim strPageID As String, strPageTypeID As String
'====================
'FOR VIEWING PURPOSES I ONLY SHOW A SHORT VERSION OF MY ORIGINAL CODE,
'TRYING TO CUT AS MUCH AS POSSIBLE...
'====================
'OPENING Comp Search Website - STEP 6
strURL = "https://beacon.schneidercorp.com/Application.aspx?AppID=84&LayerID=795&PageTypeID=2&PageID=578"
'SEND GET REQUEST
XMLpagina.Open "GET", strURL, False
XMLpagina.send
htmlDoc.body.innerHTML = XMLpagina.responseText
Call generarCopiaHtml(XMLpagina)
'GETTING THE VALUES TO BE SEND ON THE REQUESTBODY OF THE NEXT REQUEST
'GET THE EVENTTARGET
strETarget = "ctlBodyPane$ctl02$ctl01$btnSearch" 'I DON'T SCRAPE FOR THIS BECAUSE IT'S ALWAYS THE SAME
'GET THE VIEWSTATE VALUE
strT1 = "<input type='hidden' name='__VIEWSTATE' id='__VIEWSTATE' value='"
strT1 = Replace(strT1, "'", """")
strT2 = "' />"
strT2 = Replace(strT2, "'", """")
strVState = extraeVerg(XMLpagina.responseText, strT1, strT2) 'THIS CUSTOM FUNCTION EXTRACTS A TEXT LAYING BETWEEEN strT1 AND strT2
'SETS THE REQUESTBODY
strBodyRequest = "__EVENTTARGET=ctlBodyPane%24ctl02%24ctl01%24btnSearch"
strBodyRequest = strBodyRequest & "&__EVENTARGUMENT="
strBodyRequest = strBodyRequest & "&__VIEWSTATE=" & strVState
strBodyRequest = strBodyRequest & "&__VIEWSTATEGENERATOR=569DB96F"
strBodyRequest = strBodyRequest & "&ctlBodyPane%24ctl02%24ctl01%24chkUseSaleDate=on"
strBodyRequest = strBodyRequest & "&ctlBodyPane%24ctl02%24ctl01%24cboSaleDate=5" 'DEFINES HOW MANY MONTHS THE SEARCH WILL GO THROUGH
strBodyRequest = strBodyRequest & "&ctlBodyPane%24ctl02%24ctl01%24txtSaleDateHigh_VCS3Ag=" & Month(Now) & "%2F" & Day(Now) & "%2F" & Year(Now)
strBodyRequest = strBodyRequest & "&ctlBodyPane%24ctl02%24ctl01%24txtCSRPointsHigh="
'OPENING Comp Search Website(SHOWING RESULTS)- STEP 9
'SEND THE REQUEST
XMLpagina.Open "POST", strURL, False
XMLpagina.setRequestHeader "Content-type", "application/x-www-form-urlencoded"
XMLpagina.setRequestHeader "Content-Length", Len(strBodyRequest)
XMLpagina.send strBodyRequest
'GENERATE A LOCAL COPY OF THE RESPONSE
Call generarCopiaHtml(XMLpagina)
'BUILDING THE URL FOR THE NEXT REQUEST
strT1 = "{'Name':'Comp Results','PageId':"
strT1 = Replace(strT1, "'", """")
strT2 = ",'PageTypeId':"
strT2 = Replace(strT2, "'", """")
strT3 = ",'Icon"
strT3 = Replace(strT3, "'", """")
strPageID = extraeVerg(XMLpagina.responseText, strT1, strT2)
strPageTypeID = extraeVerg(XMLpagina.responseText, strT1 & strPageID & strT2, strT3)
'THE strURL MUST BE EXACTLY LIKE "https://beacon.schneidercorp.com/Application.aspx?AppID=84&LayerID=795&PageTypeID=3&PageID=579"
strURL = "https://beacon.schneidercorp.com/Application.aspx?AppID=84&LayerID=795&PageTypeID=" & strPageTypeID & "&PageID=" & strPageID
'GETTING THE VALUES TO BE SEND ON THE REQUESTBODY
strT1 = "<input type='hidden' name='__VIEWSTATE' id='__VIEWSTATE' value='"
strT1 = Replace(strT1, "'", """")
strT2 = "' />"
strT2 = Replace(strT2, "'", """")
strVState = extraeVerg(XMLpagina.responseText, strT1, strT2)
'SETS THE REQUESTBODY
strETarget = "ctlBodyPane$ctl02$ctl01$gvwAgCompResults$ctl45$lnkRecording" 'THIS VALUE MIMICS THE CLICK ON THE RECORD "2020-0418" RELATED TO THE PARCEL ID "088327354400002"
strBodyRequest = "__EVENTTARGET=" & Application.WorksheetFunction.EncodeURL(strETarget)
strBodyRequest = strBodyRequest & "&__EVENTARGUMENT="
strBodyRequest = strBodyRequest & "&__VIEWSTATE=" & strVState
strBodyRequest = strBodyRequest & "&__VIEWSTATEGENERATOR=569DB96F"
'SEND THE REQUEST
XMLpagina.Open "POST", strURL, False
XMLpagina.setRequestHeader "Content-type", "application/x-www-form-urlencoded"
XMLpagina.setRequestHeader "Content-Length", Len(strBodyRequest)
XMLpagina.send strBodyRequest
'GENERATE A LOCAL COPY OF THE RESPONSE
Call generarCopiaHtml(XMLpagina)
'ON THIS POINT I SHOULD BE GETTING INSIDE THE "Results" WEBSITE WITH AN URL LIKE THIS
' "https://beacon.schneidercorp.com/Application.aspx?AppID=84&LayerID=795&PageTypeID=3&PageID=551"
'WHICH GIVES A LIST OF THE PARCELS INVOLVED IN THE SALE, BUT IT STILL SHOWS THE LAST PAGE RESULTS...
'I CAN'T SEE WHAT AM I DOING WRONG...
End Sub
My real goal is to repeat this process to get data from some specific sales of all Iowa State Counties, but when I do the first all other won't be a problem.
Can someone show me what am I doing to wrong to make this work?
PS1: I apologize for another question related to the same problem, that I made about ten days ago, which was wrong from top to bottom, I was so tired then that I wrote some crazy stuff.
PS2: Out there seems to be a lot of information about this, but whether I'm not prepared enough to get the solution or my case is not too frequent.