2

I have read some topics explaining how to do this, which would be incredibly slow. The explanation is here: https://www.extendoffice.com/documents/excel/651-excel-remove-non-numeric-characters.html

It involves iterating through each cell in a range and then iterating through the characters in the field and removing them if they do not match [0-9].

Any suggestions to do this more efficiently?

One that comes to mind is loading the cell contents into an array, iterating through it, and splitting each entry into its own array to iterate through.

Community
  • 1
  • 1
cjones
  • 8,384
  • 17
  • 81
  • 175
  • How exactly would you iterate something without a loop? – Comintern Aug 05 '16 at 18:51
  • You can't avoid looping through the cells, but you don't need to loop through characters, just use regular expressions: http://stackoverflow.com/a/22542835/4721734. Of course you can load the range into an array to improve performance even more – Máté Juhász Aug 05 '16 at 19:03
  • @MátéJuhász - I'd be curious to see how a regex replace would compare to exploding into a byte array. It would likely destroy `Mid` in performance though... – Comintern Aug 05 '16 at 19:09

3 Answers3

3

No need for VBA or for looping. An excel formula can achieve what you want.

=NPV(-0.9,,IFERROR(MID(A1,1+LEN(A1)-ROW(OFFSET(A$1,,,LEN(A1))),1)%,""))

This is an array formula. You have to press Ctrl + Shift + Enter

enter image description here

Explanation:

Each term is multiplied by the inverse of (1+rate)^n, where n is the nth term in the series.

By using different values for rate, we can get different results. In this case, using -0.9 gives us 1 + rate = 1 + -0.9 = 0.1.

Result: {0.1;0.01;0.001;0.0001;0.00001}
Inverse of above: {10;100;1000;10000;100000}
Also NPV skips text values which contributes to the above

Disclaimer: I did not come up with this formula. I had seen this formula long time ago and simply fell in love with it. Since then it has been a part of my databank.

Siddharth Rout
  • 147,039
  • 17
  • 206
  • 250
  • Sid, I'm not good with formulas, but this one seem to just extract the numeric characters. My question, can it be modified to replace the alpha characters with a " " or a "-", etc.? Sorry, i followed the hyperlink you used in a current question. – GMalc Mar 28 '19 at 17:39
3

For the VBA side of things (note the loops), I decided to satisfy my own curiosity about the performance of a couple different methods. All of them pull the range into an array and work on it in place. The linked article will get killed in speed by any of these, simply due to the overhead in reading and writing single cell values.

For the first method, I optimized the code from the linked article "a bit":

Private Sub MidMethod(values() As Variant)
    Dim r As Long, c As Long, i As Long
    Dim temp As String, output As String

    For r = LBound(values, 1) To UBound(values, 1)
         For c = LBound(values, 2) To UBound(values, 2)
            output = vbNullString
            For i = 1 To Len(values(r, c))
                temp = Mid$(values(r, c), i, 1)
                If temp Like "[0-9]" Then
                    output = output & temp
                End If
            Next
            values(r, c) = output
         Next
    Next
End Sub

For the second method I used RegExp.Replace:

Private Sub RegexMethod(values() As Variant)
    Dim r As Long, c As Long, i As Long

    With New RegExp
        .Pattern = "[^0-9]"
        .MultiLine = True
        .Global = True
        For r = LBound(values, 1) To UBound(values, 1)
             For c = LBound(values, 2) To UBound(values, 2)
                values(r, c) = .Replace(values(r, c), vbNullString)
             Next
        Next
    End With
End Sub

Finally, for the last method I used a Byte array:

Private Sub ByteArrayMethod(values() As Variant)
    Dim r As Long, c As Long, i As Long
    Dim chars() As Byte

    For r = LBound(values, 1) To UBound(values, 1)
         For c = LBound(values, 2) To UBound(values, 2)
            chars = values(r, c)
            values(r, c) = vbNullString
            For i = LBound(chars) To UBound(chars) Step 2
                If chars(i) > 47 And chars(i) < 58 Then
                    values(r, c) = values(r, c) & Chr$(chars(i))
                End If
            Next
         Next
    Next
End Sub

Then I used this code to benchmark them against 1000 cells, each containing a random mix of 25 letters and numbers:

Private Sub Benchmark()
    Dim data() As Variant, start As Double, i As Long

    start = Timer
    For i = 1 To 5000
        data = ActiveSheet.Range("A1:J100").Value
        MidMethod data
    Next
    Debug.Print "Mid: " & Timer - start

    start = Timer
    For i = 1 To 5000
        data = ActiveSheet.Range("A1:J100").Value
        RegexMethod data
    Next
    Debug.Print "Regex: " & Timer - start

    start = Timer
    For i = 1 To 5000
        data = ActiveSheet.Range("A1:J100").Value
        ByteArrayMethod data
    Next
    Debug.Print "Byte(): " & Timer - start

End Sub

The results weren't horribly surprising - the Regex method is by far the fastest (but none of them are what I'd call "fast"):

Mid: 24.3359375 
Regex: 8.31640625 
Byte(): 22.5625

Note that I have no idea how this compares to @SiddharthRout's cool formula method in that I can't run it through my testing harness. The www.extendoffice.com code would also probably still be running, so I didn't test it.

Comintern
  • 21,855
  • 5
  • 33
  • 80
  • 1
    @brettdj and I had this discussion few years ago and we tested it... Mid always will be slower than RegEx which can be significantly noticed when tested with large dataset. – Siddharth Rout Aug 05 '16 at 20:14
  • @SiddharthRout - I was more surprised at the byte array's poor performance. I can shave a couple seconds off by replacing `Chr$(chars(i))` with `chars(i) - 48`, but it looks like the main performance bottleneck in all of them is memory copying from the string concats. – Comintern Aug 05 '16 at 20:17
  • can you please test my formula with your data set. just enter it in one cell and double click on the bottom right of the first cell to autofill. Can you see how many seconds approx does it take to calculate 1000 cells? – Siddharth Rout Aug 05 '16 at 20:20
  • I think I missed the pun? :D – Siddharth Rout Aug 05 '16 at 20:23
  • @SiddharthRout - The benchmarks were 5000 iterations over 1000 cells. I just timed the NPV method on 1m cells - It took roughly 8 seconds, but it isn't exactly a straight comparison without the writes in the benchmarks above. – Comintern Aug 05 '16 at 20:38
  • Ah Ok. I was wondering where did you come up with that figure :D – Siddharth Rout Aug 06 '16 at 05:35
3

Using regex (you need the library Microsoft VBScript Regular Expression 5.5 under Tools-References):

Public Function GetNumericValue(range)
    Set myRegExp = New RegExp
    myRegExp.IgnoreCase = True
    myRegExp.Global = True
    myRegExp.Pattern = "[\D]"

    GetNumericValue = myRegExp.Replace(range.value, "")
End Function
Techky
  • 431
  • 4
  • 10