2

I have made a web scraper using vba and JSON Parser library. My code is as below, and it works fine.

Sub Setcustoms()
Dim JSON As Object
Dim ws As Worksheet, results(), i As Long, s As String
Dim shipvalue As String, custom As String, MyURL As String
Dim BL As String, returnshipvalue As String
Dim a, b As Variant
Dim mytext, finaltext As String
Dim myvalue As Object
Dim country() As String
Dim year As String



country = Split("NL,DE,MY,US,VN,UA,ID,JP,CN,CL,CA,TH,PL,RU,PH", ",")
'country = Split("MY,VN", ",")
Dim port() As String


port = Split("KRKAN,KRKUV,KRTSN,KRPUS,KRYMH,KRINC,KRPTK,KRKPO,KRKCN,KRBNP,KRUSN", ",")
'port = Split("KRKAN,KRKUV", ",")


Application.ScreenUpdating = False


i = 2
For Each a In country
    For Each b In port
        With CreateObject("MSXML2.XMLHTTP")
            .Open "POST", "https://www.bandtrass.or.kr/customs/total.do", False
            .setRequestHeader "Accept", "application/json, text/javascript, */*; q=0.01"
            .setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.66 Safari/537.36"
            .setRequestHeader "Content-Type", "application/x-www-form-urlencoded; charset=UTF-8"
            .setRequestHeader "Accept-Encoding", "gzip, deflate, br"
            .send "SELECT_DIV1=PORT_DIV&GODS_TYPE=H&ECONO_TYPE=undefined&PORT_TYPE=B&LOCATION_TYPE=undefined&FILTER1_GODS_UNIT=&SELECT_DIV2=NATN_DIV&FILTER2_GODS_UNIT=&SELECT_DIV3=GODS_DIV&FILTER3_GODS_UNIT=10&POP_TABLE=&COL_NAME=&EXCEL_LOG=&MENU_CODE=CUS00301_POP&EXCEL_SUBJECT=&SelectCd3=4401310000" + _
            "&SelectCd1=" & b & "&SelectCd2=" & a
            
            mytext = Right(.responseText, Len(.responseText) - 44)
            finaltext = Replace(Left(mytext, Len(mytext) - 2), "\", "")
            Set JSON = JsonConverter.ParseJson(finaltext)
            For Each myvalue In JSON
                If Len(myvalue("BASE_DATE")) = 5 Then
                    year = myvalue("BASE_DATE")
                Else
                    If myvalue("IM_WGHT") <> "" Then
                        Cells(i, 2).Value = DateSerial(CInt(Left(year, 4)), CInt(Left(LTrim(myvalue("BASE_DATE")), 2)) + 1, 0)
                        Cells(i, 3).Value = a
                        Cells(i, 4).Value = b
                        Cells(i, 5).Value = myvalue("IM_WGHT") / 1000
                        If myvalue("IM_WGHT") <> 0 Then
                            Cells(i, 6).Value = myvalue("IM_AMT") * 1000 / myvalue("IM_WGHT")
                            i = i + 1
                        Else
                        End If
                    Else
                    End If
                End If
            Next
        End With
    Next
Next
Application.ScreenUpdating = True

End Sub

The only issue I have is that the scraper takes about 10 minutes to finish. I would really want to speed up the process since I will be updating the data on a monthly basis. Another viable option is that I can scrape through the recent months, but in that case I have to rewrite the whole code.

Is there any possible method to speed up the process? Thank you.

Hanalia
  • 187
  • 1
  • 9
  • Without exact measurement of that code I would suppose the most of the time is spend inside of the Post call. To be sure, just put logging, measure the time of how long the Post calls take. How to solve this, well maybe to make several parallel Post calls? – glick Jan 05 '21 at 13:56
  • @glick can you do that in VBA? – Matteo NNZ Jan 05 '21 at 13:57
  • @MatteoNNZ yes should be possible, example e.g. [here](https://stackoverflow.com/questions/39397067/asynchronous-http-post-request-in-ms-access/39573114#39573114). – glick Jan 05 '21 at 14:01
  • @MatteoNNZ and e.g. [here](https://stackoverflow.com/questions/55219181/unable-to-make-my-script-work-asynchronously/55266803#55266803) – glick Jan 05 '21 at 14:05
  • @glick I didn't know the hack to handle asynchronous requests on VBA, thanks! But still, if you send N async requests, you still have one single thread to process the callback so your responses will queue one after the other so I don't think it will change much... honestly I'd move out of VBA and do this task with another language – Matteo NNZ Jan 05 '21 at 14:06
  • If the assumption is correct, that most of the time is spend in the Post calls, then the time could be reduced significantly using async calls. And yes, the processing will be then synchronous again, but the processing should takes just very little time. But all is just theory. – glick Jan 05 '21 at 14:09
  • @glick I've thought twice and what I said above is just not true. Agree with your last comment, it is worth a try. – Matteo NNZ Jan 05 '21 at 14:10
  • Thank you @glick Let me try testing with the asynchronous requests. – Hanalia Jan 05 '21 at 14:33
  • @Dy.Lee the whole row is only 2,500 so it is not much data. – Hanalia Jan 05 '21 at 14:36
  • @Matteo NNZ yes I agree Vba really is not an appropriate language for this. – Hanalia Jan 05 '21 at 14:37
  • Minimal gain but For Each is the slower enumeration for arrays. Use Lbound to Ubound – QHarr Jan 05 '21 at 16:42
  • @QHarr Thank you for your comment! I will try to get used to the Lbounds and Ubounds . – Hanalia Jan 05 '21 at 23:31
  • @QHarr indeed, the gain is minimal, too small in my opinion to sacrifice code readibility. – Matteo NNZ Jan 06 '21 at 08:17
  • @MatteoNNZ Maybe. Like yourself, I am interested in whether the mentioned async trick actually works i.e. doesn't end up failing with partial data loads. – QHarr Jan 06 '21 at 08:19
  • @QHarr indeed, that would be interesting. It seems a nice discovery for a single request, but I really have no idea how a single thread can handle both the queue of multiple responses received asynchronously and the treatment of the responses themselves. If one response happens to arrive while the main thread is processing the previous answer (highly probable), I can't get to see how the main thread (Excel or other) will be able to take it and queue it if it's busy processing the previous answer. – Matteo NNZ Jan 06 '21 at 08:23
  • @MatteoNNZ `I really have no idea how a single thread can handle both` - exactly. I don't think it can but love someone to prove it can work. – QHarr Jan 06 '21 at 08:24
  • 1
    @QHarr it seems a job for a C# function. VBA calls the C# function and waits for it synchronously, the C# function will collect the results asynchronously, and once all ready will return to VBA the list of results. That would avoid the hack (never trusted hacks) and would allow better debugging and maintenance. I say C#, but I guess it can be anything which is callable by VBA. – Matteo NNZ Jan 06 '21 at 08:28
  • @MatteoNNZ Agreed though I imagine C# would be faster than say Python. This sort of thing is even easier with R but don't know about COM/R. – QHarr Jan 06 '21 at 08:37

0 Answers0