Final final results:
I was wondering whether the results below changed if the string was longer. I ran exactly the same tests on the same computer, except each cell had a random string of 34 characters rather than four. These were the results:
Comintern (Regexp): 136.1 ms
brettdj (Regexp): 139.9 ms
Slai (Regexp): 158.4 ms
*Original Regex: 161.0 ms*
Comintern (AN): 170.1 ms
Comintern (Hash): 183.6 ms
ThunderFrame: 232.9 ms
*Original replace: 372.9 ms*
*Original InStr: 478.1 ms*
CallumDA33: 1218.1 ms
This really shows the speed of the Regex - all the solutions utilising Regex.replace are significantly faster, with the best being Comintern's implementation.
In summary, if the strings are long, use arrays, if they are short, use the clipboard. If unsure, the optimal result is to use arrays, but this may sacrifice a little performance on short strings.
Final results:
Thanks very much for all of your suggestions, clearly I still have a lot to learn. I was thinking about this all yesterday, so I decided to rerun everything at home. Here are the final results, based on applying each of these to 30,000 four character strings.
My computer at home is an Intel i7 @ 3.6 GHz, 8GB RAM, 64-bit Windows 10 and Excel 2016. Similar conditions to before in that I have processes running in the background, but I'm not actively doing anything throughout the tests.
Original replace: 97.67 ms
Original InStr: 106.54 ms
Original Regex: 113.46 ms
ThunderFrame: 82.21 ms
Comintern (AN): 96.98 ms
Comintern (OR): 81.87 ms
Comintern (Hash): 101.18 ms
brettdj: 81.66 ms
CallumDA33: 201.64 ms
Slai: 68.38 ms
I've therefore accepted Slai's answer as it is clearly the fastest for general implementation, but I'll rerun them all at work against the actual data to check this still works.
Original post:
I have an array in Excel that is a list of part numbers. I need to turn every member of the array alphanumeric, for example
ABC123-001 -> ABC123001
ABC123/001 -> ABC123001
ABC123001 -> ABC123001
What is the fastest way of doing this?
For context, our part numbers can come in different forms, so I'm writing a function that finds the best match within a given range. At the moment, the part of the function that makes everything alphanumeric takes about 50ms to run, whereas the rest of the function takes around 30ms in total. I also can't avoid using Excel.
I've done some work myself (see answer below), but the main issue is that I have to loop through every element of the array one-by-one - could there be a better way? I've also never run tests before, so any feedback on improving them would be much appreciated.
Here is what I've tried so far.
I'm using MicroTimer and my computer has an Intel i5 @2.5GHz, 4GB of RAM, 64-bit Windows 7. I've got processes running in the background, but I'm not actively doing anything else whilst these are run.
I created 30,000 lines of random symbols using this code:
=CHAR(RANDBETWEEN(1,60))&CHAR(RANDBETWEEN(48,57))&CHAR(RANDBETWEEN(37,140))&CHAR(RANDBETWEEN(37,140))
(note how we stop the first character at 60 because '=' is char(61)
and we want to avoid Excel interpreting this as a formula. Also we force the second character to be a number so we can guarantee at least one alphanumeric character in there.)
1. Using a loop based on cases. Average time: 175ms
Using the function in this post, we load the range into an array, apply the function to each element of the array and paste it back. Code:
Function AlphaNumericOnly(strSource As Variant) As String
Dim i As Integer
Dim strResult As String
For i = 1 To Len(strSource)
Select Case Asc(Mid(strSource, i, 1))
Case 48 To 57, 65 To 90, 97 To 122: 'include 32 if you want to include space
strResult = strResult & Mid(strSource, i, 1)
End Select
Next
AlphaNumericOnly = strResult
End Function
Sub Replace()
Dim inputSh As Worksheet
Dim inputRng As Range
Set inputSh = Sheets("Data")
Set inputRng = inputSh.Range("A1:A30000")
Dim outputSh As Worksheet
Dim outputRng As Range
Set outputSh = Sheets("Replace")
Set outputRng = outputSh.Range("A1:A30000")
Dim time1 As Double, time2 As Double
time1 = MicroTimer
Dim arr As Variant
arr = inputRng
Dim i As Integer
For i = LBound(arr) To UBound(arr)
arr(i, 1) = AlphaNumericOnly(arr(i, 1))
Next i
outputRng = arr
time2 = MicroTimer
Debug.Print (time2 - time1) * 1000
End Sub
2. Using InStr() to check each character. Average time: 201ms
Define a string of valid values. Check one-by-one if the valid values appear in the array elements:
Sub InStr()
Dim inputSh As Worksheet
Dim inputRng As Range
Set inputSh = Sheets("Data")
Set inputRng = inputSh.Range("A1:A30000")
Dim outputSh As Worksheet
Dim outputRng As Range
Set outputSh = Sheets("InStr")
Set outputRng = outputSh.Range("A1:A30000")
Dim time1 As Double, time2 As Double
time1 = MicroTimer
Dim arr As Variant
arr = inputRng
Dim validValues As String
validValues = "01234567890ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" 'put numbers and capitals at the start as they are more likely'
Dim i As Integer, j As Integer
Dim result As String
For i = LBound(arr) To UBound(arr)
result = vbNullString
For j = 1 To Len(arr(i, 1))
If InStr(validValues, Mid(arr(i, 1), j, 1)) <> 0 Then
result = result & Mid(arr(i, 1), j, 1)
End If
Next j
arr(i, 1) = result
Next i
outputRng = arr
time2 = MicroTimer
Debug.Print (time2 - time1) * 1000
End Sub
3. Using regex.Replace on the array. Time: 171ms
Define a regex and use this to replace each element of the array.
Sub Regex()
Dim inputSh As Worksheet
Dim inputRng As Range
Set inputSh = Sheets("Data")
Set inputRng = inputSh.Range("A1:A30000")
Dim outputSh As Worksheet
Dim outputRng As Range
Set outputSh = Sheets("Regex")
Set outputRng = outputSh.Range("A1:A30000")
Dim time1 As Double, time2 As Double
time1 = MicroTimer
Dim arr As Variant
arr = inputRng
Dim objRegex As Object
Set objRegex = CreateObject("vbscript.regexp")
With objRegex
.Global = True
.ignorecase = True
.Pattern = "[^\w]"
End With
Dim i As Integer
For i = LBound(arr) To UBound(arr)
arr(i, 1) = objRegex.Replace(arr(i, 1), vbNullString)
Next i
outputRng = arr
time2 = MicroTimer
Debug.Print (time2 - time1) * 1000
End Sub
Edit:
@ThunderFrame - our part numbers generally come in the following formats:
- All numbers (e.g. 32523452)
- Mix of letters and numbers (e.g. AB324K234 or 123H45645)
- Mix of letters and numbers, each linked by a non-alphanumeric character (e.g. ABC001-001, ABC001/001, 123/4557-121)
I have thought about using regex.test on each string before launching into the replacement, but I'm not sure if this will just copy the string to then test it, in which case I may as well just make the replacement to start with.
@Slai - thanks for the link - I will look into that in more detail