4

How can I write code in VBA to get multi-threaded parsing?

I've looked at this tutorial, but it is not working.

I have 10000 sites, each site in one row in column A. I need at least 10 concurrent threads to parse info between tags <div></div>, take tag <a> with rel="external" from index.php on each site, then save the results to each row in column B.

Community
  • 1
  • 1
Valdemar Z
  • 91
  • 1
  • 1
  • 7
  • You might find this link interesting: http://stackoverflow.com/questions/5721564/multi-threading-in-vba – varocarbas Oct 03 '13 at 12:26
  • 7
    As the thread owner correctly mentions, you cannot do multithreading in VBA. What he is doing is opening multiple instances of Excel and then using each instance to perform the task. So basically VBA is not multithreading but the OS is doing the work. So you want to open 10 excel instances via code and then perform the actions then you can do that but that will make your system very slow. – Siddharth Rout Oct 03 '13 at 12:27
  • 1
    How about passing the 10,000 urls from column A to a COM-visible .net library that will do the multithreaded scraping and fire an event containing the result, that your VBA code can listen for and write to column B as the results come in? – Mathieu Guindon Oct 03 '13 at 13:58

3 Answers3

8

You can use multithreading in VBA but NOT natively. There are however several possibilities to achieve multithreading in VBA:

  1. C#.NET COM/dlls - create a COM/dll in C#.NET which allows you to freely create threads and reference it from VBA like other external libraries. See my post on this: here. See also this Stackoverflow post on referencing C# methods from within VBA: Using a C# dll inside EXCEL VBA
  2. VBscript worker threads - partition your algorithm into as many VBscripts as you need threads and execute them from VBA. The VBscripts can be created automatically via VBA. See my post on this: here
  3. VBA worker threads - copy your Excel workbook as many times as you need threads and execute them via VBscript from VBA. The VBscripts can be created automatically via VBA. See my post on this: here

I analyzed all these approaches and made a comparison of the pro's and con's and some performance metrics. You can find the whole post here:

http://analystcave.com/excel-multithreading-vba-vs-vbscript-vs-c-net/

Community
  • 1
  • 1
AnalystCave.com
  • 4,884
  • 2
  • 22
  • 30
  • Your blog is a fantastic answer to this question, but could you put in a little more detail here (e.g. briefly summarise the pros and cons)? – aucuparia Sep 28 '15 at 09:23
7

As @Siddharth Rout points out in his comment, the answer is no. But to expand on this a little, even methods that would seem to run in the background and enable multi-threading like abilities do not allow multithreading.

A great example of this is Application.OnTime. It allows a procedure to be run at a point in the future.

This method allows the user to continue editing the workbook until the preset amount of time has elapsed and the procedure is called. At first glance, it might seem possible that clever use of this would enable multiple code fragments to run simultaneously. Consider the following fragment:

For a = 1 To 500000000
Next a

The For...Next loop on my machine takes about 5 seconds to complete. Now consider this:

Application.OnTime Now + TimeValue("00:00:1"), "ztest2"
For a = 1 To 500000000
Next a

This calls "ztest2" one second after the Application.OnTime statement is read. It's conceivable that, since the For...Next loop takes 5 seconds and .OnTime will execute after 1 second, perhaps "ztest2" will be called in the midst of the For...Next loop, i.e., psuedo-multithreading.

Well, this does not happen. As running the above code will show, Application.OnTime must wait patiently until the For...Next loop is done.

Aaron Thomas
  • 5,054
  • 8
  • 43
  • 89
  • 1
    What about `DoEvents`? May it break `For...Next`? – LS_ᴅᴇᴠ Oct 03 '13 at 13:38
  • 1
    @LS_dev a DoEvents statement in the midst of the For...Next loop does allow control to pass to the processor, but how would you set up with the above scenario to allow ztest2 to run while the For...Next loop is running? I've been playing around with this but can't make it fit to this situation. – Aaron Thomas Oct 03 '13 at 14:34
0

While you can't do true multi-threading i.e. run threads simultaneously in parallel on different cores, you can simulate multi-threaded code by queueing up actions from multiple threads.

Example: Run subA once every 600 ms(milliseconds) and SubB once every 200 ms such that the order would be: SubB,SubB,SubB,SubA,SubB,SubB,SubB,SubA,SubB,SubB,...

'Create a new class Tick_Timer to get access to NumTicks which counts ticks in
'milliseconds.
'While not used for this script, this class can also be used for a millisecond
'StartTimer/EndTimer which I included below.
'It can also be used to create a pause, similar to wait but in ms, that can
'allow other code to run while paused which I prefer over the sleep function.
'Sleep doesn't allow interruptions and hogs processor time. 
'The pause function would be placed in a module and works similar to the
'Queue Timer loop which I'll explain below.
Private StartTick As Long
Private Declare Function GetTickCount Lib "kernel32" () As Long

Public Function GetNumTicks() As Long
    GetNumTicks = GetTickCount
End Function

'Timer functions(not used in this script)
Public Sub StartTimer()
    StartTick = GetTickCount
End Sub

Public Function EndTimer() As Long
    EndTimer = (GetTickCount - StartTick)
End Function

In a module I declared some global variables. While I'm aware many consider use of global variables s as bad practice. I always use a prefix for globals so they don't cause confusion with local variables.

In this case, I global variables have advantages over using arguments since new threads may be called at any time that may need to execute before the first timer in the queue.

Global variables can be changed anywhere so that updating the queue can be done dynamically. Also consider that nearly every subroutine uses the queue in some way so just make more sense to use globals.

Public ST_TimerName As String 'Subroutine Name that is run as a new thread.

'Two strings are used to store the queue. 
'The first stores only the start times of each thread in tickcounts. 
'This allows me to sort the queue more easily.
'The second string (ST_TimerQ) contains TimerDelay:TimerName and is created at the
'same time as the sorted launch times so they are kept synchronous.
Public ST_EndTickQ As String  'queue string: trigger times in TickCounts.
Public ST_TimerQ As String    'queue string: TimerDelay:TimerName 

'New class that allows you to get the current Tick Count.
Public ST_Timer As Tick_Timer 'timer that accesses to Tick Count

Sub SetTimer(ByVal TimerName As String, ByVal TimerDelay As Long)
'Starts a new thread called TimerName which executes after TimerDelay(ms)
'TimerName: Name of subroutine that is to be activated.
'TimerDelay:
'-value for single execution after abs(-value) delay,
'+value Repeats TimerName with a period of TimerDelay.
'0 stops repeating TimerName.
    Dim EndTick As Long
    Dim TimerDat As String

    Set ST_Timer = New Tick_Timer
    EndTick = ST_Timer.GetNumTicks + Abs(TimerDelay)

    If TimerDelay = 0 Then
    'Stops TimerName
        RemoveFromQ TimerName
    Else
    'Insert to Queue, single or repeated is determined by +/-delay stored in TimerDat.
        TimerDat = TimerDelay & ":" & TimerName
        Call AddToQ(TimerDat, EndTick)
    End If
End Sub 'SetTimer

Sub SetTimerQLoop()
'All threads are continuously merged into an action queue with a sequential
'insertion sort.
'A simple loop containing only the DoEvents function(allows other VBA code to run) 
'loops until the the next thread in the queue needs to start.
'An outer loop runs through the queue until EOQ.
    Dim EndTick As Long
    Dim EOQ As Boolean

    On Error GoTo ErrHandler
    EOQ = False
    'SetTimer Queue Loop
    Do While Not (EOQ)
        'Delay Loop, DoEvents allows other vba scripts to run during delay.
        Do
            DoEvents
        Loop Until ST_Timer.GetNumTicks >= Val(ST_EndTickQ)

        Application.Run ST_TimerName

        If Val(ST_TimerQ) > 0 Then
        'Reinsert into queue threads with pos delay value.
            EndTick = Val(ST_EndTickQ) + Val(ST_TimerQ)
            TimerDat = Val(ST_TimerQ) & ":" & ST_TimerName
            Call AddToQ(TimerDat, EndTick)
        End If

        If ST_TimerQ = vbNullString Then
            EOQ = True
        Else
            GetNextQ
        End If
    Loop
Exit Sub
ErrHandler:
    'Break Key
End Sub 'SetTimerQLoop

Sub AddToQ(ByVal TimerDat As String, ByVal EndTick As Long)
    Dim EndTickArray() As String
    Dim TimerArray() As String
    Dim LastTickIndex As Integer
    Dim LastTimerIndex As Integer

    Dim PosDatDel As Integer
    Dim TimerDelay As Long
    Dim TimerName As String
    Dim QFirstTick As Long
    Dim QLastTick As Long

    PosDatDel = Len(TimerDat) - InStr(TimerDat, ":")
    TimerDelay = Val(TimerDat)
    TimerName = Right(TimerDat, PosDatDel)

    If ST_EndTickQ = vbNullString Then
    'First timer
        ST_TimerName = TimerName
        ST_EndTickQ = EndTick
        ST_TimerQ = TimerDat
        SetTimerQLoop
    ElseIf InStr(ST_EndTickQ, "|") = 0 Then
    'Second timer
        If EndTick < Val(ST_EndTickQ) Then
        'New timer is first of 2 in Q
            ST_TimerName = TimerName
            ST_EndTickQ = EndTick & "|" & ST_EndTickQ
            ST_TimerQ = TimerDat & "|" & ST_TimerQ
        Else
        'New timer is 2nd of 2 in Q
            ST_TimerName = TimerNameF(ST_TimerQ)
            ST_EndTickQ = ST_EndTickQ & "|" & EndTick
            ST_TimerQ = ST_TimerQ & "|" & TimerDat
        End If
    Else
    '3rd+ timer: split queue into an array to find new timers position in queue.
        TimerArray = Split(ST_TimerQ, "|")
        LastTimerIndex = UBound(TimerArray)
        EndTickArray = Split(ST_EndTickQ, "|")
        LastTickIndex = UBound(EndTickArray)
        ReDim Preserve EndTickArray(LastTickIndex)
        ReDim Preserve TimerArray(LastTimerIndex)
        QFirstTick = Val(ST_EndTickQ)
        QLastTick = Val(EndTickArray(LastTickIndex))

        If EndTick < QFirstTick Then
        'Front of queue
            ST_EndTickQ = EndTick & "|" & ST_EndTickQ
            ST_TimerQ = TimerDat & "|" & ST_TimerQ
            ST_TimerName = Val(ST_TimerQ)
        ElseIf EndTick > QLastTick Then
        'Back of queue
            ST_TimerName = TimerNameF(ST_TimerQ)
            ST_EndTickQ = ST_EndTickQ & "|" & EndTick
            ST_TimerQ = ST_TimerQ & "|" & TimerDat
        Else
        'Somewhere mid queue
            For i = 1 To LastTimerIndex
                If EndTick < EndTickArray(i) Then
                    ST_EndTickQ = Replace(ST_EndTickQ, EndTickArray(i - 1), _
                    EndTickArray(i - 1) & "|" & EndTick)
                    ST_TimerQ = Replace(ST_TimerQ, TimerArray(i - 1), _
                    TimerArray(i - 1) & "|" & TimerDat)
                    Exit For
                End If
            Next i
            ST_TimerName = TimerNameF(ST_TimerQ)
        End If
    End If
End Sub 'AddToQ

Sub RemoveFromQ(ByVal TimerName As String)
    Dim EndTickArray() As String
    Dim TimerArray() As String
    Dim LastTickIndex As Integer
    Dim LastTimerIndex As Integer
    Dim PosDel As Integer

    PosDel = InStr(ST_EndTickQ, "|")

    If PosDel = 0 Then
    'Last element remaining in queue
        ST_EndTickQ = vbNullString
        ST_TimerQ = vbNullString
        ST_TimerName = vbNullString
    Else
    '2+ elements in queue
        TimerArray = Split(ST_TimerQ, "|")
        LastTimerIndex = UBound(TimerArray)
        EndTickArray = Split(ST_EndTickQ, "|")
        LastTickIndex = UBound(EndTickArray)
        ReDim Preserve EndTickArray(LastTickIndex)
        ReDim Preserve TimerArray(LastTimerIndex)
        ST_TimerQ = vbNullString
        ST_EndTickQ = vbNullString
        For i = 0 To LastTimerIndex
            If InStr(TimerArray(i), TimerName) = 0 Then
                If ST_TimerQ = vbNullString Then
                    ST_TimerQ = TimerArray(i)
                    ST_EndTickQ = EndTickArray(i)
                    X = Len(ST_TimerQ) - InStr(ST_TimerQ, ":")
                    ST_TimerName = Right(ST_TimerQ, X)
                Else
                    ST_TimerQ = ST_TimerQ & "|" & TimerArray(i)
                    ST_EndTickQ = ST_EndTickQ & "|" & EndTickArray(i)
                End If
            End If
        Next i
    End If
End Sub 'RemoveFromQ

Sub GetNextQ()
    Dim PosDel As Integer

    PosDel = InStr(ST_EndTickQ, "|")
    If PosDel = 0 Then
    'Last element remaining in queue
        ST_EndTickQ = vbNullString
        ST_TimerQ = vbNullString
        ST_TimerName = vbNullString
    Else
    '2+ elements in queue
        ST_EndTickQ = Right(ST_EndTickQ, Len(ST_EndTickQ) - PosDel)
        ST_TimerQ = Right(ST_TimerQ, Len(ST_TimerQ) - InStr(ST_TimerQ, "|"))
        ST_TimerName = TimerNameF(ST_TimerQ)
    End If
End Sub 'GetNextQ

Public Function TimerNameF(ByVal TimerQ As String) As String
    Dim StrLen As Integer
    If InStr(ST_TimerQ, "|") Then
        StrLen = InStr(ST_TimerQ, "|") - InStr(ST_TimerQ, ":") - 1
    Else
        StrLen = Len(ST_TimerQ) - InStr(ST_TimerQ, ":")
    End If
    TimerNameF = Mid(ST_TimerQ, InStr(ST_TimerQ, ":") + 1, StrLen)
End Function

Sub TestSetTimer1()
'Call SubA every 5 seconds
    Call SetTimer("SubA", 600)
End Sub

Sub TestSetTimer2()
'Call SubB every second
    Call SetTimer("SubB", 200)
End Sub

Sub TestSetTimer3()
'Stop calling SubA
    Call SetTimer("SubA", 0)
End Sub

Sub TestSetTimer4()
'Stop calling SubB
    Call SetTimer("SubB", 0)
End Sub

Sub TestSetTimer5()
'Call SubC one time after a 3 second delay.
    Call SetTimer("SubC", -3000)
End Sub


Sub SubA()
    Debug.Print "SubA Queue: " & ST_TimerQ & ", EndTickQ: " & ST_EndTickQ
End Sub

Sub SubB()
    Debug.Print "SubB Queue: " & ST_TimerQ & ", EndTickQ: " & ST_EndTickQ
End Sub

Sub SubC()
    Debug.Print "SubC Queue: " & ST_TimerQ & ", EndTickQ: " & ST_EndTickQ
End Sub

I'm not an expert coder so I'm sure others can do better but it runs fairly well as written. The bulk of the code just manages the queue which can probably be done more efficiently.

Besides SetTimer, You can also create threads that trigger on a schedule, with a mouse or keyboard event, or even for screen scrapping pixels in your active window.

Threads are useful when you can't know when to activate code at design time. For example:

You create a poker HUD+DB for on-line tournament poker. One thread could run every couple hundred ms waiting for triggers such as when a new hand begins you read the last HH and update the database and hud, or a new player joins the table and it does an automatic look up on a tournament tracking site. Another thread may run every second to update a tournament clock displayed on your hud and provide a 3 min warning before level changes and so on.

You can even create a seperate shell script to run a thread that automatically joins new tournaments you scheduled for in advance and it can then launch a new copy of your script for each table your playing. I'm not certain if launching multiple copies of scripts or running scripts from different projects can then truly multi-thread using VBA but I kind of doubt it based on what I've seen in the forums.

Note, while it was running bug free, I made some changes to clean it up a bit and introduced some minor bugs I never got a chance to fix.