0

I have been using some useful VBA code by PEH that uses regular expression to extract the number of instances of a specific element in a chemical formula, see: https://stackoverflow.com/a/46091904/17194644

It works well, but everything slows down when I use the function hundreds of times in one worksheet. I was wondering if this might be due to the time it takes VBA to read/write values from/to the cells, so I created an array function (based on the regex code by PEH) to see if it would speed things up, see below. The function works and is quicker but can still slow things down when dealing with hundreds of values, and I cannot get the second part to work that finds multiplies elements within parenthesis. Any thoughts on how to improve further?

Function CountElements(ChemFormulaRange As Variant, ElementRange As Variant) As Variant

'define variables
Dim RetValRange() As Long
Dim RetVal As Long
Dim ChemFormula As String
Dim npoints As Long
Dim i As Long
Dim mpoints As Long
Dim j As Long

' Convert input ranges to variant arrays
If TypeName(ChemFormulaRange) = "Range" Then ChemFormulaRange = ChemFormulaRange.Value
If TypeName(ElementRange) = "Range" Then ElementRange = ElementRange.Value

'parameter
npoints = UBound(ChemFormulaRange, 1) - LBound(ChemFormulaRange, 1) + 1
mpoints = UBound(ElementRange, 2) - LBound(ElementRange, 2) + 1

'dimension arrays
ReDim RetValRange(1 To npoints, 1 To mpoints)

'calculate all values
For j = 1 To mpoints
Element = ElementRange(1, j)
For i = 1 To npoints
RetVal = 0
ChemFormula = ChemFormulaRange(i, 1)
Call ChemRegex(ChemFormula, Element, RetVal)
RetValRange(i, j) = RetVal
Next i
Next j

'output answer
CountElements = RetValRange

End Function
Private Sub ChemRegex(ChemFormula, Element, RetVal)
    Dim regEx As New RegExp
    With regEx
        .Global = True
        .MultiLine = True
        .IgnoreCase = False
    End With
    
    'first pattern matches every element once
    regEx.Pattern = "([A][cglmrstu]|[B][aehikr]?|[C][adeflmnorsu]?|[D][bsy]|[E][rsu]|[F][elmr]?|[G][ade]|[H][efgos]?|[I][nr]?|[K][r]?|[L][airuv]|[M][cdgnot]|[N][abdehiop]?|[O][gs]?|[P][abdmortu]?|[R][abefghnu]|[S][bcegimnr]?|[T][abcehilms]|[U]|[V]|[W]|[X][e]|[Y][b]?|[Z][nr])([0-9]*)"
    
    Dim Matches As MatchCollection
    Set Matches = regEx.Execute(ChemFormula)
    
    Dim m As Match
    For Each m In Matches
        If m.SubMatches(0) = Element Then
            RetVal = RetVal + IIf(Not m.SubMatches(1) = vbNullString, m.SubMatches(1), 1)
        End If
    Next m
    
    'second patternd finds parenthesis and multiplies elements within
'    regEx.Pattern = "(\((.+?)\)([0-9])+)+?"
'    Set Matches = regEx.Execute(ChemFormula)
'    For Each m In Matches
'        RetVal = RetVal + ChemFormula(m.SubMatches(1), Element) * (m.SubMatches(2) - 1) '-1 because all elements were already counted once in the first pattern
'    Next m
End Sub
Daniel
  • 57
  • 9

3 Answers3

3

If you are using Office 365, then you do not need VBA. A formula can achieve what you want and I think it would be faster.

=TRIM(TEXTJOIN("",TRUE,IFERROR((MID(A1,ROW(INDIRECT("1:"&LEN(A1))),1)*1)," ")))

enter image description here

Note: If you still need a VBA solution then remember you can enter the above formula in the entire range in one go and then convert it to values.

rng.Formula = "=TRIM(TEXTJOIN("""",TRUE,IFERROR((MID(A1,ROW(INDIRECT(""1:""&LEN(A1))),1)*1),"" "")))"
rng.Value = rng.Value
Siddharth Rout
  • 147,039
  • 17
  • 206
  • 250
  • Fine solution +:) - Fyi Posted an alternative with a possible spill range output based on a single array comparison. – T.M. Dec 10 '21 at 19:49
2

The slowest part of your ChemRegex routine is creating the RegExp object.

If all your cells are passed to CountElements as a pair of large areas move the code that creates the RegExp object and applies a few properties from ChemRegex to CountElements, and pass the RegExp reference from CountElements to ChemRegex.

Or, if you are calling CountElements as say a UDF in multiple cells, declare RegExp at module level

Private RegEx as RegExp

In CountElements...

If RegEx is Nothing Then
    Set RegEx = New RegExp
    ' apply the properties
End If
' code
' and pass RegEx to ChemRegex
Call ChemRegex(ChemFormula, Element, RetVal, RegEx)
Peter T
  • 306
  • 1
  • 5
0

Isolate all numbers in chemical formula

Just for the sake of the art an alternative to Siddharth 's approach, where I demonstrate how to use Match() comparing

  • an array of each formula character|digit in the given string with
  • an array of all regular digits.

This allows to identify array elements (here: digits) based on their position. So this demo might be also helpful to solve similar requirements. - I don't pretend this to be a better or faster way.

Function ChemNo(ByVal s As String) As Variant
'Purp: return array of found character positions in chars string
'Note: (non-findings show Error 2042; can be identified by IsError + Not IsNumeric)
    Dim digits
    digits = String2Arr("1234567890")
    'get any digit position within array digits     ' note: zero position returns 10
    Dim tmp
    tmp = Application.Match(String2Arr(s), digits, 0)
    'check for digits in a loop through tmp
    Dim i As Long, ii As Long
    For i = 1 To UBound(tmp)
        If IsNumeric(tmp(i)) Then                   ' found digit
            tmp(i) = tmp(i) Mod 10                  ' get digtis including zeros
            If IsNumeric(tmp(i - 1)) Then           ' check preceding digit
                tmp(i) = 10 * tmp(i - 1) + tmp(i)   ' complete number
                tmp(i - 1) = "!"                    ' mark former digit
            End If
        Else
            tmp(i) = "!"                            ' mark non-numeric element
        End If
    Next i
    
    ChemNo = Filter(tmp, "!", False)                ' delete marked elements
End Function

Help function String2Arr()

Assigns an array of single characters after atomizing a string input:

Function String2Arr(ByVal s As String) As Variant
'Purp: return array of all single characters in a string
'Idea: https://stackoverflow.com/questions/13195583/split-string-into-array-of-characters
    s = StrConv(s, vbUnicode)
    String2Arr = Split(s, vbNullChar, Len(s) \ 2)
End Function

If you want to use the function as tabular input profiting from the newer dynamic features in Excel, you may enter it as user defined function e.g. in cell B1: =ChemNo(A1) displaying each number horizontally in as so called spill range. Using older versions, I suppose you'd need a CSE entry (Ctrl) to mark it as {array} formula.

udf example

T.M.
  • 9,436
  • 3
  • 33
  • 57