13

Final final results:

I was wondering whether the results below changed if the string was longer. I ran exactly the same tests on the same computer, except each cell had a random string of 34 characters rather than four. These were the results:

Comintern (Regexp):       136.1  ms  
brettdj (Regexp):         139.9  ms  
Slai (Regexp):            158.4  ms  
*Original Regex:          161.0  ms*    
Comintern (AN):           170.1  ms  
Comintern (Hash):         183.6  ms  
ThunderFrame:             232.9  ms    
*Original replace:        372.9  ms*  
*Original InStr:          478.1  ms*  
CallumDA33:              1218.1 ms

This really shows the speed of the Regex - all the solutions utilising Regex.replace are significantly faster, with the best being Comintern's implementation.

In summary, if the strings are long, use arrays, if they are short, use the clipboard. If unsure, the optimal result is to use arrays, but this may sacrifice a little performance on short strings.

Final results:

Thanks very much for all of your suggestions, clearly I still have a lot to learn. I was thinking about this all yesterday, so I decided to rerun everything at home. Here are the final results, based on applying each of these to 30,000 four character strings.

My computer at home is an Intel i7 @ 3.6 GHz, 8GB RAM, 64-bit Windows 10 and Excel 2016. Similar conditions to before in that I have processes running in the background, but I'm not actively doing anything throughout the tests.

Original replace:  97.67  ms
Original InStr:    106.54 ms
Original Regex:    113.46 ms
ThunderFrame:      82.21  ms
Comintern (AN):    96.98  ms
Comintern (OR):    81.87  ms
Comintern (Hash):  101.18 ms
brettdj:           81.66  ms
CallumDA33:        201.64 ms
Slai:              68.38  ms

I've therefore accepted Slai's answer as it is clearly the fastest for general implementation, but I'll rerun them all at work against the actual data to check this still works.


Original post:

I have an array in Excel that is a list of part numbers. I need to turn every member of the array alphanumeric, for example

ABC123-001 -> ABC123001
ABC123/001 -> ABC123001
ABC123001  -> ABC123001

What is the fastest way of doing this?

For context, our part numbers can come in different forms, so I'm writing a function that finds the best match within a given range. At the moment, the part of the function that makes everything alphanumeric takes about 50ms to run, whereas the rest of the function takes around 30ms in total. I also can't avoid using Excel.

I've done some work myself (see answer below), but the main issue is that I have to loop through every element of the array one-by-one - could there be a better way? I've also never run tests before, so any feedback on improving them would be much appreciated.

Here is what I've tried so far.

I'm using MicroTimer and my computer has an Intel i5 @2.5GHz, 4GB of RAM, 64-bit Windows 7. I've got processes running in the background, but I'm not actively doing anything else whilst these are run.

I created 30,000 lines of random symbols using this code:

=CHAR(RANDBETWEEN(1,60))&CHAR(RANDBETWEEN(48,57))&CHAR(RANDBETWEEN(37,140))&CHAR(RANDBETWEEN(37,140))

(note how we stop the first character at 60 because '=' is char(61) and we want to avoid Excel interpreting this as a formula. Also we force the second character to be a number so we can guarantee at least one alphanumeric character in there.)

1. Using a loop based on cases. Average time: 175ms

Using the function in this post, we load the range into an array, apply the function to each element of the array and paste it back. Code:

Function AlphaNumericOnly(strSource As Variant) As String
    Dim i As Integer
    Dim strResult As String

    For i = 1 To Len(strSource)
        Select Case Asc(Mid(strSource, i, 1))
            Case 48 To 57, 65 To 90, 97 To 122: 'include 32 if you want to include space
                strResult = strResult & Mid(strSource, i, 1)
        End Select
    Next
    AlphaNumericOnly = strResult
End Function

Sub Replace()

    Dim inputSh As Worksheet
    Dim inputRng As Range
        Set inputSh = Sheets("Data")
        Set inputRng = inputSh.Range("A1:A30000")

    Dim outputSh As Worksheet
    Dim outputRng As Range
        Set outputSh = Sheets("Replace")
        Set outputRng = outputSh.Range("A1:A30000")

    Dim time1 As Double, time2 As Double
        time1 = MicroTimer

    Dim arr As Variant
        arr = inputRng

    Dim i As Integer
        For i = LBound(arr) To UBound(arr)
            arr(i, 1) = AlphaNumericOnly(arr(i, 1))
        Next i

    outputRng = arr

    time2 = MicroTimer

    Debug.Print (time2 - time1) * 1000

End Sub

2. Using InStr() to check each character. Average time: 201ms

Define a string of valid values. Check one-by-one if the valid values appear in the array elements:

Sub InStr()

    Dim inputSh As Worksheet
    Dim inputRng As Range
        Set inputSh = Sheets("Data")
        Set inputRng = inputSh.Range("A1:A30000")

    Dim outputSh As Worksheet
    Dim outputRng As Range
        Set outputSh = Sheets("InStr")
        Set outputRng = outputSh.Range("A1:A30000")

    Dim time1 As Double, time2 As Double
        time1 = MicroTimer

    Dim arr As Variant
        arr = inputRng

    Dim validValues As String
        validValues = "01234567890ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" 'put numbers and capitals at the start as they are more likely'

    Dim i As Integer, j As Integer
    Dim result As String

        For i = LBound(arr) To UBound(arr)
        result = vbNullString
            For j = 1 To Len(arr(i, 1))
                If InStr(validValues, Mid(arr(i, 1), j, 1)) <> 0 Then
                    result = result & Mid(arr(i, 1), j, 1)
                End If
            Next j
        arr(i, 1) = result
        Next i

    outputRng = arr

    time2 = MicroTimer

    Debug.Print (time2 - time1) * 1000

End Sub

3. Using regex.Replace on the array. Time: 171ms

Define a regex and use this to replace each element of the array.

Sub Regex()

    Dim inputSh As Worksheet
    Dim inputRng As Range
        Set inputSh = Sheets("Data")
        Set inputRng = inputSh.Range("A1:A30000")

    Dim outputSh As Worksheet
    Dim outputRng As Range
        Set outputSh = Sheets("Regex")
        Set outputRng = outputSh.Range("A1:A30000")

    Dim time1 As Double, time2 As Double
        time1 = MicroTimer

    Dim arr As Variant
        arr = inputRng

    Dim objRegex As Object
        Set objRegex = CreateObject("vbscript.regexp")
        With objRegex
            .Global = True
            .ignorecase = True
            .Pattern = "[^\w]"
        End With

    Dim i As Integer
        For i = LBound(arr) To UBound(arr)
            arr(i, 1) = objRegex.Replace(arr(i, 1), vbNullString)
        Next i

    outputRng = arr

    time2 = MicroTimer

    Debug.Print (time2 - time1) * 1000

End Sub

Edit:

@ThunderFrame - our part numbers generally come in the following formats:

  • All numbers (e.g. 32523452)
  • Mix of letters and numbers (e.g. AB324K234 or 123H45645)
  • Mix of letters and numbers, each linked by a non-alphanumeric character (e.g. ABC001-001, ABC001/001, 123/4557-121)

I have thought about using regex.test on each string before launching into the replacement, but I'm not sure if this will just copy the string to then test it, in which case I may as well just make the replacement to start with.

@Slai - thanks for the link - I will look into that in more detail

Community
  • 1
  • 1
Jonathan
  • 1,015
  • 1
  • 9
  • 25
  • Your sample data looks like it has a pattern to it, but your random data is very different. If there's a pattern to the placement and frequency of the characters, then there's probably an optimized approach. Can you provide more real examples/rules? – ThunderFrame Dec 02 '16 at 10:39
  • 1
    Thanks for the quick responses - I've replied in the answer above – Jonathan Dec 02 '16 at 11:00
  • You could easily check if the part number is actually all number and skip a lot of code - should reduce the average time – CallumDA Dec 02 '16 at 13:08
  • Deleted the link by accident on my phone. I think the point was that Byte arrays are a bit faster than `Mid$` https://fastexcel.wordpress.com/2011/10/18/writing-efficient-vba-udfs-part-6-faster-strings-and-byte-arrays/ – Slai Dec 03 '16 at 16:11
  • 2
    Thanks for posting your benchmarking results. The most important result is the production result. It's interesting to see how much of the total time is consumed by overhead (i.e. the Excel specific stuff or creating objects like a `RegExp` or `Dictionary`). The differences also narrow so much when you're looking at 30K v 1M calls that makes it clear that there are points of diminishing returns from micro-optimization based on the data size. – Comintern Dec 03 '16 at 16:22
  • 1
    Interesting that the Clipboard method ended up below the other RegExp approaches for the actual test. I tried a similar approach concatenating the values to a single string, running the regexp, splitting the string back to the cells. My original regexp loop was 20% faster – brettdj Dec 06 '16 at 07:55
  • Would have split the bounties equally if I could - system forced me to up the second one. Great question (with continuous involvement) with great answers. – brettdj Dec 06 '16 at 11:31
  • 2
    Thanks brett, I really appreciate your input too :) – Jonathan Dec 06 '16 at 12:14
  • Not a huge issue, but in your InStr version, the character ZERO is in your validValues assignment TWICE. I fixed that typo in my answer. – Excel Hero May 21 '20 at 16:17

6 Answers6

7

Not sure if this would be faster because it depends on too many factors, but might be worth testing. Instead of Regex.Replace each value separately, you can get the copied Range text from the clipboard and replace all values at once. Note that \w matches underscore and Unicode letters too, so being more specific in the regular expression can make it faster.

'[a1:b30000] = [{"ABC123-009",""}]: Dim t As Double: t = Timer ' used for testing

Dim r As Range, s As String
Set r = ThisWorkbook.Worksheets("Data").UsedRange.Resize(, 1) ' Data!A1:A30000
With New MSForms.DataObject ' needs reference to "Microsoft Forms 2.0 Object Library" or use a bit slower late binding - With CreateObject("New:{1C3B4210-F441-11CE-B9EA-00AA006B1A69}")
   r.Copy
   .GetFromClipboard
    Application.CutCopyMode = False
    s = .GetText
    .Clear ' optional - clear the clipboard if using Range.PasteSpecial instead of Worksheet.PasteSpecial "Text"

    With New RegExp ' needs reference to "Microsoft VBScript Regular Expressions 5.5" or use a bit slower late binding - With CreateObject("VBScript.RegExp")
        .Global = True
        '.IgnoreCase = False ' .IgnoreCase is False by default
        .Pattern = "[^0-9A-Za-z\r\n]+" ' because "[^\w\r\n]+" also matches _ and Unicode letters
        s = .Replace(s, vbNullString)
    End With

    .SetText s
    .PutInClipboard
End With

' about 70% of the time is spent here in pasting the data 
r(, 2).PasteSpecial 'xlPasteValues ' paste the text from clipboard in B1

'Debug.Print Timer - t

I expect this to be slower for less values because of the clipboard overhead, and maybe slower for a lot more values because of the memory needed.

Disabling events didn't seem to make difference in my tests, but might be worth trying.

Note that there is a tiny chance of another application using the clipboard while the macro is using it.

If early binding causes issues from running the same compiled macro on different machines, you can search for macro decompiler or remove the references and switch to late binding.

Graham
  • 7,431
  • 18
  • 59
  • 84
Slai
  • 22,144
  • 5
  • 45
  • 53
  • Very nice! I noticed it adds a number of blank cells. Why is that? – Brian Dec 05 '16 at 12:19
  • @Brian I am not sure what you mean. `vbTab` (`\t`) character is used for cell separator and `vbCrLf` (`\r\n`) new line is used for row separator, so you can trim those from the end of the string to avoid some of the blank cells. If any of the cell values contain any of those 3 characters, they will be surrounded by `"`. – Slai Dec 05 '16 at 12:39
  • @Slai I just mean that I ran your code on 50,000 rows and it worked perfectly but added over 1,600 blank cells. When I removed the blank cells it evened everything out and I saw the _before and after_ on the same line. I know nothing about Regex, which I need to give some time and attention to. :-) – Brian Dec 05 '16 at 12:58
  • @Brian if you mean the last row of blank cells, it is because a new line is always added after the copied cells. When pasted back, Excel pastes it as a rectangular range and the new line is treated as row of blank cells. – Slai Dec 05 '16 at 13:00
  • @Slai I'm sorry, I meant 1,600 additional rows (not blank). There is a blank cell every so often down the column. So it pushes everything down. But I think I follow what you're saying. +1 for an excellent solution. – Brian Dec 05 '16 at 13:06
  • @Slai. I'm intrigued by the use of the clipboard. Is it faster than, say, `Join(Application.Transpose(Range("A1:a30000")), vbCrLf)` ? – Mark.R Dec 05 '16 at 17:06
  • @Mark.R I would recommend testing it, because it might depend on factors that I am unaware of. My guess is yes, because in my tests the opposite `r.Value2 = Application.Transpose(Split(s, vbCrLf))` was slower than `r.PasteSpecial`. The general idea behind this answer is converting all values at once to avoid some of the extra memory allocations and small overhead that comes from converting them one by one. The Clipboard is just the easiest way I could think of, but probably not the best. – Slai Dec 05 '16 at 18:24
  • @Mark.R Yes. I tested this and the Clipboard was faster, interesting that the array loop prevailed though on the longer strings. – brettdj Dec 06 '16 at 08:44
  • 2
    @brettdj Thanks. I was expecting it the other way around with it the process all at once approach getting faster the more characters are used. My guess is that it gets slower the more memory is needed, so I am thinking of testing if processing the range in two parts would help. I had another idea with Evaluate that could potentially be faster, but it probably requires the Excel 2016 CONCAT/TEXTJOIN function and I am limited to Excel 2007. – Slai Dec 06 '16 at 10:43
  • Nice idea. From Charles William's the worksheet evaluate is faster than application evaluate. I will try your suggestion. – brettdj Dec 06 '16 at 11:33
6

tl;dr - Regular expressions destroy VBA implementations. If this were a code challenge, @brettj or @Slai should win it.

There are a bunch of tricks to make your AlphaNumericOnly faster.

First, you can get rid of the vast majority of the function calls by treating it as a byte array instead of a string. That removes all of the calls to Mid$ and Asc. Although these are incredibly fast functions, they still add the overhead pushing onto and popping off of the call stack. That adds up over a couple hundred thousand iterations.

The second optimization is to not use Case x To y syntax if you can avoid it. The reason has to do with how it compiles - it doesn't compile to a test like Case = Condition >= x And Condition <= y, it actually creates a loop with an early exit condition like this:

Case = False
For i = x To y
    If Condition = i Then
        Case = True
    End If
Next

Again, not a huge performance hit, but it adds up. The third optimization is to order your tests in a way that makes them sort circuit on the most likely hits in your data set. I tailored my examples below for primarily letters, with most of them upper case. You may do better with different ordering. Put it all together and you get something that looks like this:

Public Function ByteAlphaNumeric(source As Variant) As String
    Dim chars() As Byte
    Dim outVal() As Byte
    chars = CStr(source)        'Load the array up.

    Dim bound As Long
    bound = UBound(chars)       'Size the outbound array.
    ReDim outVal(bound)

    Dim i As Long, pos As Long
    For i = 0 To bound Step 2   'Wide characters, only care about the ASCII range.
        Dim temp As Byte
        temp = chars(i)         'Pointer math isn't free. Cache it.
        Select Case True        'Order is important here.
            Case temp > 64 And temp < 91
                outVal(pos) = temp
                pos = pos + 2   'Advance the output pointer.
            Case temp < 48
            Case temp > 122
            Case temp > 96
                outVal(pos) = temp
                pos = pos + 2
            Case temp < 58
                outVal(pos) = temp
                pos = pos + 2
        End Select
    Next
    'This is likely the most expensive operation.
    ReDim Preserve outVal(pos)  'Trim the output array.
    ByteAlphaNumeric = outVal
End Function

How does it do? Pretty well:

Public Sub Benchmark()
    Dim starting As Single, i As Long, dummy As String, sample As Variant

    sample = GetRandomString

    starting = Timer
    For i = 1 To 1000000
        dummy = AlphaNumericOnlyOP(sample)
    Next i
    Debug.Print "OP's AlphaNumericOnly: ", Timer - starting

    starting = Timer
    For i = 1 To 1000000
        dummy = AlphaNumericOnlyThunderframe(sample)
    Next i
    Debug.Print "ThunderFrame's AlphaNumericOnly: ", Timer - starting

    starting = Timer
    For i = 1 To 1000000
        dummy = AlphaNumeric(sample)
    Next i
    Debug.Print "CallumDA33's AlphaNumeric: ", Timer - starting

    starting = Timer
    For i = 1 To 1000000
        dummy = ByteAlphaNumeric(sample)
    Next i
    Debug.Print "ByteAlphaNumeric: ", Timer - starting

    Dim cast As String
    cast = CStr(sample)
    starting = Timer
    For i = 1 To 1000000
        dummy = ByteAlphaNumericString(cast)
    Next i
    Debug.Print "ByteAlphaNumericString: ", Timer - starting

    Set stripper = Nothing
    starting = Timer
    For i = 1 To 1000000
        dummy = OptimizedRegex(sample)
    Next i
    Debug.Print "OptimizedRegex: ", Timer - starting

End Sub

Private Function GetRandomString() As Variant
    Dim chars(30) As Byte, i As Long
    Randomize
    For i = 0 To 30 Step 2
        chars(i) = Int(96 * Rnd + 32)
    Next i
    Dim temp As String
    temp = chars
    GetRandomString = CVar(temp)
End Function

Results with a 15 character random String:

OP`s AlphaNumericOnly:                     6.565918 
ThunderFrame`s AlphaNumericOnly:           3.617188 
CallumDA33`s AlphaNumeric:                23.518070 
ByteAlphaNumeric:                          2.354980

Note, I omitted submissions that weren't trivial to convert to functions. You may notice 2 additional test - the ByteAlphaNumericString is exactly the same as the ByteAlphaNumeric function, but it takes a String as input instead of a Variant and gets rid of the cast. That's not trivial:

ByteAlphaNumericString:                    2.226074

And finally, the elusive OptimizedRegex function (basically @brettj's code in function form for comparison timing):

Private stripper As RegExp  'Module level

Function OptimizedRegex(strSource As Variant) As String
    If stripper Is Nothing Then
        Set stripper = New RegExp
        With stripper
            .Global = True
            .Pattern = "[^0-9A-Za-z]"
        End With
    End If
    OptimizedRegex = stripper.Replace(strSource, vbNullString)
End Function
OptimizedRegex:                            1.094727 

EDIT: Bonus implementation!

It occurred to me that a hash table lookup might be faster than a Select Case structure, so I built one with using a Scripting.Dictionary:

Private hash As Scripting.Dictionary  'Module level

Function HashLookups(source As Variant) As String
    Dim chars() As Byte
    Dim outVal() As Byte

    chars = CStr(source)
    Dim bound As Long
    bound = UBound(chars)
    ReDim outVal(bound)

    Dim i As Long, pos As Long
    With hash
        For i = 0 To bound Step 2
            Dim temp As Byte
            temp = chars(i)
            If .Exists(temp) Then
                outVal(pos) = temp
                pos = pos + 2
            End If
        Next
    End With
    ReDim Preserve outVal(pos)
    HashLookups = outVal
End Function

Private Sub LoadHashTable()
    Set hash = New Scripting.Dictionary
    Dim i As Long
    For i = 48 To 57
        hash.Add i, vbNull
    Next
    For i = 65 To 90
        hash.Add i, vbNull
    Next
    For i = 97 To 122
        hash.Add i, vbNull
    Next
End Sub

'Test code:
    starting = Timer
    LoadHashTable
    For i = 1 To 1000000
        dummy = HashLookups(sample)
    Next i
    Debug.Print "HashLookups: ", Timer - starting

It turned out to be not too shabby:

HashLookups:                               1.655273

Final Version

Woke up and thought I'd try a vector lookup instead of a hash lookup (just fill a byte array of values to keep and use that for tests). This seems reasonable in that it's only a 256 element array - basically a truth table:

Private lookup(255) As Boolean 'Module level

Function VectorLookup(source As Variant) As String
    Dim chars() As Byte
    Dim outVal() As Byte

    chars = CStr(source)
    Dim bound As Long
    bound = UBound(chars)
    ReDim outVal(bound)

    Dim i As Long, pos As Long
    For i = 0 To bound Step 2
        Dim temp As Byte
        temp = chars(i)
        If lookup(temp) Then
            outVal(pos) = temp
            pos = pos + 2
        End If
    Next
    ReDim Preserve outVal(pos)
    VectorLookup = outVal
End Function

Private Sub GenerateTable()
    Dim i As Long
    For i = 48 To 57
        lookup(i) = True
    Next
    For i = 65 To 90
        lookup(i) = True
    Next
    For i = 97 To 122
        lookup(i) = True
    Next
End Sub

Assuming that the lookup table is only generated once, it's clocking in somewhere around 10-15% faster than any other pure VBA method above.

Comintern
  • 21,855
  • 5
  • 33
  • 80
  • this is a great post. – brettdj Dec 03 '16 at 08:20
  • Thanks for the multiple suggestions. I'm just trying your last one now and I think there's a couple of errors. I got an error on the 'With hash'/'End With' as I hadn't loaded the dictionary from your previous answer - I'm not sure you need this? Also, the ReDim statement removes the last character from the answer - I don't think you need the -2 to remove the last two bytes. Thanks very much though :) – Jonathan Dec 05 '16 at 11:25
  • @Jonathan - Copy and paste error - I completely forgot to remove the `With`. Corrected. – Comintern Dec 05 '16 at 13:14
5

Credit to ThunderFrame (I'm a sucker for a LHS Mid$) but I got better performance from the early bound RegExp with additional small tweaks:

  • Use Value2 rather than Value
  • Declare your loop with long not integer
  • .ignorecase = True is redundant

code

    Sub Replace2()

    Dim inputSh As Worksheet
    Dim inputRng As Range
    Set inputSh = Sheets("Data")
    Set inputRng = inputSh.Range("A1:A30000")

    Dim outputSh As Worksheet
    Dim outputRng As Range
    Set outputSh = Sheets("Replace")
    Set outputRng = outputSh.Range("A1:A30000")

    Dim time1 As Double, time2 As Double
    time1 = MicroTimer

    Dim arr As Variant
    Dim objRegex As VBScript_RegExp_55.RegExp
    Dim i As Long

    Set objRegex = CreateObject("vbscript.regexp")
    With objRegex
            .Global = True
            .Pattern = "[^\w]"
    End With

    arr = inputRng.Value2
    For i = LBound(arr) To UBound(arr)
            arr(i, 1) = objRegex.Replace(arr(i, 1), vbNullString)
    Next i
    outputRng.Value2 = arr

    time2 = MicroTimer
    Debug.Print (time2 - time1) * 1000
    End Sub
brettdj
  • 54,857
  • 16
  • 114
  • 177
4

If you change the function in your first, and currently best performing routine, to the following, you'll get a performance improvement of at least 40-50% depending on your data:

Function AlphaNumericOnly(strSource As Variant) As String
    Dim i As Long
    Dim charCount As Long
    Dim strResult As String
    Dim char As String
    strResult = Space$(Len(strSource))
    For i = 1 To Len(strSource)
        char = Mid$(strSource, i, 1)
        Select Case Asc(char)
            Case 48 To 57, 65 To 90, 97 To 122: 'include 32 if you want to include space
                charCount = charCount + 1
                Mid$(strResult, charCount, 1) = char
        End Select
    Next
    AlphaNumericOnly = Left$(strResult, charCount)
End Function

I used a few optimizations, but chiefly, you were re-assigning the strResult multiple times in a loop, which is very expensive, and even more expensive when your strings are longer (and the loop runs more times). Much better to use Mid$.

And, using the $-suffixed functions are optimized for strings, so you'll get better performance there too

Optimizing the RegEx version

Your Regex approach has reasonable performance, but you're using late-bound CreateObject, which would be much faster as an early-bound, strongly typed reference.

Furthermore, your Regex pattern and options are the same every time, you could declare the regex object as variable, and only create it if it doesn't already exist, then re-use the existing regex each time.

ThunderFrame
  • 9,352
  • 2
  • 29
  • 60
  • That's fantastic, thanks. I'm getting averages of about 150ms, which is still a significant improvement (this will eventually run over around 6000 rows, so every millisecond saved is ~6s reduced processing time). I'm more than happy to accept it, but I'm just going to wait in case something else turns up. Thanks again :) – Jonathan Dec 02 '16 at 11:07
1

It is not true that Regex has to be the winner. My second solution below is faster than even early-bound Regex! And my first solution is as fast as late-bound Regex. BOTH ARE NATIVE VBA ONLY.

Interesting question. The Original InStr method should be much faster than the results shown in the OP's question.

Its poor performance is due to string concatenation, which VBA is not good at. The longer the strings the worse it gets.

My version of the InStr method below does not use concatenation at all. It is many times faster than the original. In fact, its speed of execution matches late-bound Regex. This InStr version is completely native to VBA and very, very fast. And the longer the source strings, the faster it gets, relative to concatenation.

This method also gains a few ticks of performance by utilizing the ($) version of string functions instead of the variant version. InStrB is slightly faster than InStr. And using temporary string variables t and arx saves a good chunk of time as well.

Sub InStr_ExcelHero()

    Dim inputSh As Worksheet
    Dim inputRng As Range
        Set inputSh = Sheets("Data")
        Set inputRng = inputSh.Range("A1:A30000")

    Dim outputSh As Worksheet
    Dim outputRng As Range
        Set outputSh = Sheets("InStr")
        Set outputRng = outputSh.Range("A1:A30000")

    Dim time1 As Double, time2 As Double
        time1 = MicroTimer

    Dim i&, j&, p&, max&, arx$, t$, res$, arr
        arr = inputRng
        max = Len(arr(1, 1))

    Dim validVals$: validVals = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"

        For i = LBound(arr) To UBound(arr)
            p = 0
            arx = arr(i, 1)
            res = Space$(max)
            For j = 1 To max
                t = Mid$(arx, j, 1)
                If InStrB(validVals, t) Then
                    p = p + 1
                    Mid$(res, p, 1) = t
                End If
            Next
            arr(i, 1) = Left$(res, p)
        Next

    outputRng = arr

    time2 = MicroTimer
    Debug.Print (time2 - time1) * 1000

End Sub

And the ArrayLookup version below is more than twice as fast as InStr_ExcelHero().

In fact, the method below is actually faster than early-bound Regex!

This is native VBA. No dependencies. Faster than Regex. The following method is likely the quickest way to turn every element of an array to alphanumeric... when directed from VBA... other than a custom c++ dll:

Sub ArrayLookup_ExcelHero()

    Const VALS$ = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"

    Dim inputSh As Worksheet
    Dim inputRng As Range
        Set inputSh = Sheets("Data")
        Set inputRng = inputSh.Range("A1:A30000")

    Dim outputSh As Worksheet
    Dim outputRng As Range
        Set outputSh = Sheets("InStr")
        Set outputRng = outputSh.Range("A1:A30000")

    Dim time1 As Double, time2 As Double
        time1 = MicroTimer

    Dim i&, j&, p&, max&, t&, arx() As Byte, res() As Byte, arr
        arr = inputRng
        max = Len(arr(1, 1))

    Dim Keep&(0 To 255)
        For i = 1 To Len(VALS)
            Keep(Asc(Mid$(VALS, i, 1))) = 1
        Next

        For i = LBound(arr) To UBound(arr)
            p = 0
            ReDim res(0 To max)
            arx = StrConv(arr(i, 1), vbFromUnicode)
            For j = 0 To max - 1
                t = arx(j)
                If Keep(t) Then
                    res(p) = t
                    p = p + 1
                End If
            Next
            arr(i, 1) = StrConv(res, vbUnicode)
        Next

    outputRng = arr

    time2 = MicroTimer
    Debug.Print (time2 - time1) * 1000

End Sub
Excel Hero
  • 14,253
  • 4
  • 33
  • 40
0

I'll throw this out there, if nothing else to see how it performs. I'm sure it could be tidied up a bit too.

My hope is that the method for testing if a character is a letter turn out faster. I'm sure testing for a number could be done a bit quicker though.

Function AlphaNumeric(s As String) As String
    Dim char As String, tempStr As String
    Dim i As Integer
    Dim t As Variant

    For i = 1 To Len(s)
        char = Mid(s, i, 1)
        If IsLetter(char) Or IsNumber(char) Then
            tempStr = tempStr & char
        End If
    Next i
    AlphaNumeric = tempStr
End Function

Private Function IsLetter(s As String) As Boolean
    If UCase(s) = s And LCase(s) = s Then
        IsLetter = False
    Else:
        IsLetter = True
    End If
End Function

Private Function IsNumber(s As String)
    On Error GoTo 1
    s = s * 1
    IsNumber = True
    Exit Function
1:
    IsNumber = False
End Function
CallumDA
  • 12,025
  • 6
  • 30
  • 52
  • 2
    I'm getting an average of 371ms. However, I did have to change your function very slightly to make s a variant, not a string. Otherwise I get a ByRef argument type mismatch when running it on the array (which is originally declared as a variant). – Jonathan Dec 02 '16 at 12:59