0

I have spent about 15 days trying to make a scraper in VBA, I've been making decent progresses day after day but these last two days I got stuck in the very last step to get the data.

This is a continuation of my previous post, which gave me a good guide to start.

Here's the process I want to simulate usign MSXML (not Internet Explorer)

  1. Open https://beacon.schneidercorp.com/
  2. Select "Iowa State"
  3. Select "Boone County, IA"
  4. Click on the popup link "Property Search"
  5. In the top red ribbon, click on the "Comp Search" label
  6. At the bottom of the resulting page, in the "Agricultural Comparables Search" section check the "Sale Date" checkbox
  7. Select 5 months in the "Sale Date" combobox
  8. Click on the "Search" button at the bottom of the "Agricultural Comparables Search" section
  9. In the resulting page, look for "Parcel ID" identified as "088327354400002" and click on the link on the "Recording" column (value "2020-0418")

I could achieve the first 8 steps but I haven't been able to get URL of the results that should be get from that last link held in "2020-0418"

As I did to get from the 8th to the 9th step, I noticed that inside the Development ToolKit's "Network" Tab, the website sent a POST request, as shown below.

**General**
    Request URL: https://beacon.schneidercorp.com/Application.aspx?AppID=84&LayerID=795&PageTypeID=3&PageID=579&Q=1926372975
    Request Method: POST
    Status Code: 302 
    Remote Address: 52.168.93.150:443
    Referrer Policy: no-referrer-when-downgrade
**Response Headers**
    alt-svc: quic=":443"; ma=2592000; v="44,43,39"
    cache-control: private
    content-encoding: gzip
    content-length: 187
    content-type: text/html; charset=utf-8
    date: Sat, 27 Jun 2020 00:46:42 GMT
    location: /Application.aspx?AppID=84&LayerID=795&PageTypeID=3&PageID=551&Q=1603287013
    status: 302
    vary: Accept-Encoding
**Request Headers**
    :authority: beacon.schneidercorp.com
    :method: POST
    :path: /Application.aspx?AppID=84&LayerID=795&PageTypeID=3&PageID=579&Q=1926372975
    :scheme: https
    accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-   exchange;v=b3;q=0.9
    accept-encoding: gzip, deflate, br
    accept-language: es-ES,es;q=0.9
    cache-control: max-age=0
    content-length: 395
    content-type: application/x-www-form-urlencoded
    cookie: _ga=GA1.2.1299682399.1590279064; MODULES508=; MODULESVISIBILE508=18469; MODULES1024=;   MODULESVISIBILE1024=29489%7C29501; MODULES501=; MODULESVISIBILE501=10310; _gid=GA1.2.449363625.1593013300;  ASP.NET_SessionId=4xwgdh2cqto0kugirkani4vp; _gat=1
    origin: https://beacon.schneidercorp.com
    referer: https://beacon.schneidercorp.com/Application.aspx?AppID=84&LayerID=795&PageTypeID=3&PageID=579&Q=1926372975
    sec-fetch-dest: document
    sec-fetch-mode: navigate
    sec-fetch-site: same-origin
    sec-fetch-user: ?1
    upgrade-insecure-requests: 1
    user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116   Safari/537.36
Query String Parameters (source view)
    AppID=84&LayerID=795&PageTypeID=3&PageID=579&Q=1926372975
**Form Data (source view)**
__EVENTTARGET=ctlBodyPane%24ctl02%24ctl01%24gvwAgCompResults%24ctl05%24lnkRecording&__EVENTARGUMENT=&__VIEWSTATE=cbg8zdrx99ofbjcpw9%2FCE8J0v2SY5W86N%2Fbx%2FU0CsnNPy9D3bcg%2F5YstkCGTwd03lObnZbF9%2B5QuO1lP658HYgyXsOmpImGVjhn47teNdO788MngiEN9qzZbzrOv8jZAd93B8QXltxoPV5dLVu0%2BELpETwwTteNsmbKNEr1IpBz2aSxsN1spJUTKy42SUE37HkdUqVpsQlCPHPyIomJH4b6CoepL2uG9y45pMbUYFZxPG5ob&__VIEWSTATEGENERATOR=569DB96F

Next, I show a sample of my code

Sub ScrapingTest()
   Dim XMLpagina As New MSXML2.ServerXMLHTTP60
   Dim htmlDoc As New MSHTML.htmlDocument
   Dim strURL As String, strBodyRequest As String
   Dim strETarget As String, strVState As String
   Dim strT1 As String, strT2 As String, strT3 As String
   Dim strPageID As String, strPageTypeID As String
   
   '====================
   'FOR VIEWING PURPOSES I ONLY SHOW A SHORT VERSION OF MY ORIGINAL CODE,
   'TRYING TO CUT AS MUCH AS POSSIBLE...
   '====================
   
   'OPENING Comp Search Website - STEP 6
   strURL = "https://beacon.schneidercorp.com/Application.aspx?AppID=84&LayerID=795&PageTypeID=2&PageID=578"
   'SEND GET REQUEST
   XMLpagina.Open "GET", strURL, False
   XMLpagina.send
   htmlDoc.body.innerHTML = XMLpagina.responseText
   
   Call generarCopiaHtml(XMLpagina)
   
   'GETTING THE VALUES TO BE SEND ON THE REQUESTBODY OF THE NEXT REQUEST
   'GET THE EVENTTARGET
   strETarget = "ctlBodyPane$ctl02$ctl01$btnSearch" 'I DON'T SCRAPE FOR THIS BECAUSE IT'S ALWAYS THE SAME
   
   'GET THE VIEWSTATE VALUE
   strT1 = "<input type='hidden' name='__VIEWSTATE' id='__VIEWSTATE' value='"
   strT1 = Replace(strT1, "'", """")
   strT2 = "' />"
   strT2 = Replace(strT2, "'", """")
   strVState = extraeVerg(XMLpagina.responseText, strT1, strT2) 'THIS CUSTOM FUNCTION EXTRACTS A TEXT LAYING BETWEEEN strT1 AND strT2
   
   'SETS THE REQUESTBODY
   strBodyRequest = "__EVENTTARGET=ctlBodyPane%24ctl02%24ctl01%24btnSearch"
   strBodyRequest = strBodyRequest & "&__EVENTARGUMENT="
   strBodyRequest = strBodyRequest & "&__VIEWSTATE=" & strVState
   strBodyRequest = strBodyRequest & "&__VIEWSTATEGENERATOR=569DB96F"
   strBodyRequest = strBodyRequest & "&ctlBodyPane%24ctl02%24ctl01%24chkUseSaleDate=on"
   strBodyRequest = strBodyRequest & "&ctlBodyPane%24ctl02%24ctl01%24cboSaleDate=5" 'DEFINES HOW MANY MONTHS THE SEARCH WILL GO THROUGH
   strBodyRequest = strBodyRequest & "&ctlBodyPane%24ctl02%24ctl01%24txtSaleDateHigh_VCS3Ag=" & Month(Now) & "%2F" & Day(Now) & "%2F" & Year(Now)
   strBodyRequest = strBodyRequest & "&ctlBodyPane%24ctl02%24ctl01%24txtCSRPointsHigh="

   'OPENING Comp Search Website(SHOWING RESULTS)- STEP 9
   'SEND THE REQUEST
   XMLpagina.Open "POST", strURL, False
   XMLpagina.setRequestHeader "Content-type", "application/x-www-form-urlencoded"
   XMLpagina.setRequestHeader "Content-Length", Len(strBodyRequest)
   XMLpagina.send strBodyRequest
   
   'GENERATE A LOCAL COPY OF THE RESPONSE
   Call generarCopiaHtml(XMLpagina)
   
   'BUILDING THE URL FOR THE NEXT REQUEST
   strT1 = "{'Name':'Comp Results','PageId':"
   strT1 = Replace(strT1, "'", """")
   strT2 = ",'PageTypeId':"
   strT2 = Replace(strT2, "'", """")
   strT3 = ",'Icon"
   strT3 = Replace(strT3, "'", """")
   strPageID = extraeVerg(XMLpagina.responseText, strT1, strT2)
   strPageTypeID = extraeVerg(XMLpagina.responseText, strT1 & strPageID & strT2, strT3)
   'THE strURL MUST BE EXACTLY LIKE "https://beacon.schneidercorp.com/Application.aspx?AppID=84&LayerID=795&PageTypeID=3&PageID=579"
   strURL = "https://beacon.schneidercorp.com/Application.aspx?AppID=84&LayerID=795&PageTypeID=" & strPageTypeID & "&PageID=" & strPageID
   
   'GETTING THE VALUES TO BE SEND ON THE REQUESTBODY
   
   strT1 = "<input type='hidden' name='__VIEWSTATE' id='__VIEWSTATE' value='"
   strT1 = Replace(strT1, "'", """")
   strT2 = "' />"
   strT2 = Replace(strT2, "'", """")
   strVState = extraeVerg(XMLpagina.responseText, strT1, strT2)

   'SETS THE REQUESTBODY
   strETarget = "ctlBodyPane$ctl02$ctl01$gvwAgCompResults$ctl45$lnkRecording" 'THIS VALUE MIMICS THE CLICK ON THE RECORD "2020-0418" RELATED TO THE PARCEL ID "088327354400002"
   strBodyRequest = "__EVENTTARGET=" & Application.WorksheetFunction.EncodeURL(strETarget)
   strBodyRequest = strBodyRequest & "&__EVENTARGUMENT="
   strBodyRequest = strBodyRequest & "&__VIEWSTATE=" & strVState
   strBodyRequest = strBodyRequest & "&__VIEWSTATEGENERATOR=569DB96F"
   'SEND THE REQUEST
   XMLpagina.Open "POST", strURL, False
   XMLpagina.setRequestHeader "Content-type", "application/x-www-form-urlencoded"
   XMLpagina.setRequestHeader "Content-Length", Len(strBodyRequest)
   XMLpagina.send strBodyRequest
   
   'GENERATE A LOCAL COPY OF THE RESPONSE
   Call generarCopiaHtml(XMLpagina)
   
   'ON THIS POINT I SHOULD BE GETTING INSIDE THE "Results" WEBSITE WITH AN URL LIKE THIS
   ' "https://beacon.schneidercorp.com/Application.aspx?AppID=84&LayerID=795&PageTypeID=3&PageID=551" 
   'WHICH GIVES A LIST OF THE PARCELS INVOLVED IN THE SALE, BUT IT STILL SHOWS THE LAST PAGE RESULTS...
   'I CAN'T SEE WHAT AM I DOING WRONG...
   
End Sub

My real goal is to repeat this process to get data from some specific sales of all Iowa State Counties, but when I do the first all other won't be a problem.

Can someone show me what am I doing to wrong to make this work?

PS1: I apologize for another question related to the same problem, that I made about ten days ago, which was wrong from top to bottom, I was so tired then that I wrote some crazy stuff.

PS2: Out there seems to be a lot of information about this, but whether I'm not prepared enough to get the solution or my case is not too frequent.

  • Does the URI shown in the "Location" response header link to the results you are looking for? You can see that it is different to the URI shown in the "Path" request header (and the 302 status code would lead you to expect this behaviour) – barrowc Jun 28 '20 at 01:32
  • Yes, that location is the one that I'm looking for, I even know how to get it, but the URL by itself doesn't work if the right request headers have not been passed. I need to know why my method is not working... Thanks for your comment – Antonio Graterol Jun 28 '20 at 01:52
  • @Jason Aller thanks for editing my misspellings but, can you also advice me if I can do something to expose it more and then get a little help on this? – Antonio Graterol Jun 30 '20 at 01:48

0 Answers0