2

I've been looking for a while for a code that would give me the digits between without using regex (I want my macro to be used by anyone especially non-computer friendly people). This is a small part of a code creating series for a chart dynamically creating the chart etc.
Here is the type of data I am dealing with "C23H120N5O4Cl" so I'd like to save in a variable 23 then in another one 120 the rest should not matter (it could be nothing). My digits will likely be between single characters (C,H,or else) but I need the numbers after C and H. So at the moment here is my code :

  RangeOccupied = Range("C2").End(xlDown).row


  For i = 1 To RangeOccupied

    If i <> RangeOccupied Then

      'Look for digits after C

      pos = InStr(1, Cells(i + 1, 2), "C") + 1
      pos1 = InStr(pos, Cells(i + 1, 2), "H")
      NumC = Mid(Cells(i + 1, 2), pos, pos1 - pos)

      'Look for digits after H

      pos = InStr(1, Cells(i + 1, 2), "H") + 1
      pos1 = InStr(pos, Cells(i + 1, 2), "O")
      NumH = Mid(Cells(i + 1, 2), pos, pos1 - pos)
    End If
  Next

Ideally I'd like the pos1 numbers not to be dependent on a specific character but any character. i.e having pos1=InStr(pos,Cells(i+1,2),"ANY NON-NUMBER CHARACTER").

I do not know if it is possible without using regex.

MrYuchiang
  • 23
  • 5
  • I wrote Hello, at the beginning but it does not show weirdly... – MrYuchiang Oct 04 '18 at 17:29
  • 1
    Why ***not*** use regex? It's the appropriate tool for the job. – Comintern Oct 04 '18 at 17:30
  • 1
    I use a mac (don't mac-shame me please) and when I started using regex it told me I needed an extra QuartZ add-on or something like that. which I could install but I assumed if I don't have it other mac users probably won't. – MrYuchiang Oct 04 '18 at 17:31
  • 1
    You'd likely get more suggestion if you could include some examples which show the type of data you're dealing with, and what values would be extracted from those examples. – Tim Williams Oct 04 '18 at 17:39
  • @TimWilliams I edited my post. Thanks for the advice ! – MrYuchiang Oct 04 '18 at 17:44
  • See previously: https://stackoverflow.com/questions/46091219/extract-numbers-from-chemical-formula , http://www.vbaexpress.com/kb/getarticle.php?kb_id=670 – Tim Williams Oct 04 '18 at 19:44
  • What about (eg) methane (CH4) or other "single instance" cases where the atomic symbol has no number following it? To be robust you really need a full list of symbols to account for this type of case... – Tim Williams Oct 04 '18 at 20:01
  • I was testing one of the method posted in the first link and this is what I realised... – MrYuchiang Oct 04 '18 at 20:06

6 Answers6

1

This function will return an array of the digit strings in a text string

Option Explicit
Function myDigits(str As String) As String()
    Dim col As Collection
    Dim I As Long, S() As String

I = 0
Set col = New Collection
Do Until I > Len(str)
    I = I + 1
    If IsNumeric(Mid(str, I, 1)) Then
        col.Add Val(Mid(str, I, Len(str)))
        I = I + 1
        Do Until Not IsNumeric(Mid(str, I, 1))
            I = I + 1
        Loop
    End If
Loop

ReDim S(0 To col.Count - 1)
    For I = 1 To col.Count
        S(I - 1) = col(I)
    Next I
myDigits = S
End Function
Ron Rosenfeld
  • 53,870
  • 7
  • 28
  • 60
  • Hi, thanks for your answer, I believe this solution is closed to the one @Kubie proposed. Unfortunately, I'd need to know that the numbers were indeed next a C or an H, so if someone decides to write "O5ClC3H10" I'd have numbers = 5 3 10 without knowing if it was before a C or an H right ? – MrYuchiang Oct 04 '18 at 18:11
  • 1
    @MrYuchiang That requirement was not listed in your question. What do you mean by **next to**? Before? After? both? what if the number is at the beginning or end of the string? – Ron Rosenfeld Oct 04 '18 at 18:13
  • Sorry, reading your answers made me realise thatwhat I need is to extract the digits right after a C (or/and an H) but since it is likely that they will be between other characters I thought I needed a more generic solution. I hope I am more clear. The numbers will never be at the beginning could be at the end though. – MrYuchiang Oct 04 '18 at 18:19
0

EDIT

Changed the function to use and return dictionaries having keys of "C" and "H" paired with their numbers. Included a screenshot below.

Made sure it handles for tricky situations where multiple letters are packed ontop of each other:

enter image description here

Code:

Sub mainLoop()

    Dim numbers As Scripting.Dictionary: Set numbers2 = New Scripting.Dictionary

    For i = 1 To 5
        Set numbers = returnDict(Cells(i, 1).Value)
        printout numbers, i
    Next

End Sub

Function returnDict(cellValue As String) As Scripting.Dictionary

    Dim i As Integer: i = 1
    Dim holder As String: holder = ""
    Dim letter As String

    Set returnStuff = New Scripting.Dictionary

    While i < Len(cellValue)
        If Mid(cellValue, i, 1) = "C" Or Mid(cellValue, i, 1) = "H" Then
            i = i + 1
            If IsNumeric(Mid(cellValue, i, 1)) Then
                letter = (Mid(cellValue, i - 1, 1))
                Do While IsNumeric(Mid(cellValue, i, 1))
                    holder = holder & Mid(cellValue, i, 1)
                    i = i + 1
                    If i > Len(cellValue) Then Exit Do
                Loop
                returnStuff.Add letter, holder
                holder = ""
            ElseIf Mid(cellValue, i, 1) <> LCase(Mid(cellValue, i, 1)) Then
                returnStuff.Add Mid(cellValue, i - 1, 1), "1"
            End If
        Else
            i = i + 1
        End If
    Wend
End Function

And heres a quick little function used to print out the contents of the dictionary

Sub printout(dict As Scripting.Dictionary, row As Integer)

    Dim i As Integer: i = 2

    For Each Key In dict.Keys
        Cells(row, i).Value = Key & ": " & dict.Item(Key)
        i = i + 1
    Next

End Sub
Kubie
  • 1,551
  • 3
  • 12
  • 23
  • Thanks @Kubie, so this function kind of give a collection of all the numbers in the string but we don't know next to what characters they all are. So, if say someone decides to write "O5ClC3H10" I'd have numbers = 5 3 10 without knowing if it was before a C an O a Cl...right ? – MrYuchiang Oct 04 '18 at 18:07
  • Hi @MrYuchiang if you need to store the characters also, you should look into using a Scripting.Dictionary instead of a collection. Then you can store the characters with the numbers. – Kubie Oct 04 '18 at 18:59
  • I'll look into that but this solution is not quite what I want since I need to know the numbers after C and H no matter their place in the string, they can be between two characters (non numeric) but also at the end of the string – MrYuchiang Oct 04 '18 at 19:04
0

Okay, I'm absolutely certain there is a more efficient way of doing this. But I think the following example makes it fairly clear on one way to separate your values.

Option Explicit

Sub test()
    Dim testValues() As String
    Dim val1 As Long
    Dim val2 As Long

    testValues = Split("C23H120N5O4Cl,C23O120N5H4Cl,C4H120", ",")

    Dim testValue As Variant
    For Each testValue In testValues
        ExtractValues testValue, val1, val2
        Debug.Print "For " & testValue & ": " & val1 & " and " & val2
    Next testValue
End Sub

Public Sub ExtractValues(ByVal inString As String, _
                         ByRef output1 As Long, _
                         ByRef output2 As Long)
    Dim outString1 As String
    Dim outString2 As String
    Dim stage As String
    stage = "Begin"

    Dim thisCharacter As String
    Dim i As Long
    For i = 1 To Len(inString)
        thisCharacter = Mid$(inString, i, 1)
        Select Case stage
            Case "Begin"
                If thisCharacter = "C" Then stage = "First Value"

            Case "First Value"
                If (Asc(thisCharacter) >= Asc("0")) And _
                   (Asc(thisCharacter) <= Asc("9")) Then
                    outString1 = outString1 & thisCharacter
                Else
                    '--- if we get here, we're done with this value
                    output1 = CLng(outString1)

                    '--- verify the next character is the "H"
                    If thisCharacter = "H" Then
                        stage = "Second Value"
                    Else
                        stage = "Next Value"
                    End If
                End If

            Case "Next Value"
                If thisCharacter = "H" Then stage = "Second Value"

            Case "Second Value"
                If (Asc(thisCharacter) >= Asc("0")) And _
                   (Asc(thisCharacter) <= Asc("9")) Then
                    outString2 = outString2 & thisCharacter
                Else
                    '--- if we get here, we're done with this value
                    output2 = CLng(outString2)
                    stage = "Finished"
                    Exit For
                End If
        End Select
    Next i

    If Not (stage = "Finished") Then
        output2 = CLng(outString2)
    End If
End Sub
PeterT
  • 8,232
  • 1
  • 17
  • 38
  • Thanks @PeterT I believe that this code does more than I need, in fact while reading answers and thinking I realised that I only need to know the digits on the right of a C (or/and an H) so it not necessary to check if the next character is the H for instance but I think we're getting closer to the solution. – MrYuchiang Oct 04 '18 at 18:16
  • After trying it I think that this would not work if it was C23O120N5H4Cl – MrYuchiang Oct 04 '18 at 18:47
  • Using the code "C23O120N5H4Cl" my solution returned 23 and 4. If you're saying that you'd want 23 and 120 instead, then that part of your original question was unclear. I'm glad you're getting solution ideas. That's what SO is all about. – PeterT Oct 04 '18 at 19:17
  • Great you're right it does work now I only need the scenario when the numbers are at the end of the string like: C4H120, in that case it returns val2 = 0 or if we have H120C4 it returns "0 and 0" – MrYuchiang Oct 04 '18 at 19:23
  • Please take a look at the updated code and see if it matches your requirements – PeterT Oct 04 '18 at 19:32
0

Here's another method that's more generic and efficient than my first solution. This approach uses a function to extract the number following a given substring -- in this case it's a single letter "C" or "H". The function accounts for the value being at the end of the input value as well.

Option Explicit

Sub test()
    Dim testValues() As String
    Dim val1 As Long
    Dim val2 As Long

    testValues = Split("C23H120N5O4Cl,C23O120N5H4Cl,C4H120", ",")

    Dim testValue As Variant
    For Each testValue In testValues
        val1 = NumberAfter(testValue, "C")
        val2 = NumberAfter(testValue, "H")
        Debug.Print "For " & testValue & ": " & val1 & " and " & val2
    Next testValue
End Sub

Private Function NumberAfter(ByVal inString As String, _
                             ByVal precedingString As String) As Long
    Dim outString As String
    Dim thisToken As String
    Dim foundThisToken As Boolean
    foundThisToken = False

    Dim i As Long
    For i = 1 To Len(inString)
        thisToken = Mid$(inString, i, 1)
        If thisToken = precedingString Then
            foundThisToken = True
        ElseIf foundThisToken Then
            If thisToken Like "[0-9]" Then
                outString = outString & thisToken
            Else
                Exit For
            End If
        End If
    Next i
    NumberAfter = CLng(outString)
End Function
PeterT
  • 8,232
  • 1
  • 17
  • 38
0

I found this solution from here Extract numbers from chemical formula

Public Function ElementCount(str As String, element As String) As Long
    Dim i As Integer
    Dim s As String

    For i = 1 To 3
        s = Mid(str, InStr(str, element) + 1, i)
        On Error Resume Next
        ElementCount = CLng(s)
        On Error GoTo 0
    Next i
End Function

Which works but if simple molecules like CH4 are put in it does not work since no number are shown... but I (we) can probably work that out.

Thanks again for all the solutions !

EDIT:

Here is the function I use that I think takes all possible scenarios into account ! Thanks again for your help !

Public Function ElementCount(str As String, element As String) As Long
    Dim k As Integer
    Dim s As String

    For k = 1 To Len(str)

        s = Mid(str, InStr(str, element) + 1, k)

        On Error Resume Next
        ElementCount = CLng(s)
        On Error GoTo 0

        If InStr(str, element) > 0 And ElementCount = 0 Then

           ElementCount = 1

        End If
    Next k

End Function
MrYuchiang
  • 23
  • 5
  • That seems to only return the very first set of digits that is preceded by the `element`. And it removes the leading zero. – Ron Rosenfeld Oct 05 '18 at 00:56
  • It actualy works well when I incorpaorate it in my loop no matter the place of C or H or O or anything and does return 1 when there is a letter but no number. Maybe it is my loop that makes it work ? – MrYuchiang Oct 05 '18 at 09:59
0

My 2c:

Sub tester()
    Dim r, arr, v
    arr = Array("C", "Z", "Na", "N", "O", "Cl", "Br", "F")
    For Each v In arr
        Debug.Print v, ParseCount("C15H12Na2N5O4ClBr", v)
    Next v
End Sub

Function ParseCount(f, s)

    Const ALL_SYMBOLS As String = "Ac,Al,Am,Sb,Ar,As,At,Ba,Bk,Be,Bi,Bh,Br,Cd,Ca,Cf,Ce,Cs,Cl," & _
     "Cr,Co,Cn,Cu,Cm,Ds,Db,Dy,Es,Er,Eu,Fm,Fl,Fr,Gd,Ga,Ge,Au,Hf,Hs,He,Ho,In,Ir,Fe,Kr,La,Lr," & _
     "Pb,Li,Lv,Lu,Mg,Mn,Mt,Md,Hg,Mo,Mc,Nd,Ne,Np,Ni,Nh,Nb,No,Og,Os,Pd,Pt,Pu,Po,Pr,Pm,Pa,Ra," & _
     "Rn,Re,Rh,Rg,Rb,Ru,Rf,Sm,Sc,Sg,Se,Si,Ag,Na,Sr,Ta,Tc,Te,Ts,Tb,Tl,Th,Tm,Sn,Ti,Xe,Yb,Zn," & _
     "Zr,B,C,F,H,I,N,O,P,K,S,W,U,V,Y"

    Dim atoms, rv, pos, i As Long
    atoms = Split(ALL_SYMBOLS, ",")

    rv = 0 'default return value

    If IsError(Application.Match(s, atoms, 0)) Then
        rv = -1        'not valid atomic symbol
    Else
        i = 1
        pos = InStr(i, f, s, vbBinaryCompare)
        If pos > 0 Then
            If Len(s) = 2 Then
                'should be a true match...
                rv = ExtractNumber(f, pos + 2)
            ElseIf Len(s) = 1 Then
                'check for false positives eg "N" matches on "Na"
                Do While pos > 0 And Mid(f, pos + 1, 1) Like "[a-z]"
                    i = pos + 1
                    pos = InStr(i, f, s, vbBinaryCompare)
                Loop
                If pos > 0 Then rv = ExtractNumber(f, pos + 1)
            Else
                'exotic chemistry...
            End If
        End If
    End If
    ParseCount = rv
End Function

'extract consecutive numeric digits from f starting at pos
' *returns 1 if no number present*
Function ExtractNumber(f, pos)
    Dim rv, s, i As Long
    Do While (pos + i) <= Len(f)
        If Not Mid(f, pos + i, 1) Like "#" Then Exit Do
        i = i + 1
    Loop
    ExtractNumber = IIf(i = 0, 1, Mid(f, pos, i))
End Function
Tim Williams
  • 154,628
  • 8
  • 97
  • 125