1

I scraped text from old files and need to get numerical data placed within strings.

Strings look like:

"season: 1983 colony: 23 colony weight: 4 kg yeild: 12 kg
"season: 1983 colony:- colony weight: 5 kg yeild: 14 kg"

I made a function that takes a raw data string and returns an array of integers.

Function getClearBeeData(rawData As Variant) As Integer()
  Dim retValue(4) As Integer 'array where each found number stored
  Dim strTempString As String 'temporary string to hold current number
  Dim i, k As Integer 'i counter for original string, k counter for array position
  Dim token As Boolean 'token shows whether previous chars were number
  token = False
  
  For i = 1 To Len(rawData)   'go through original string
  
    If IsNumeric(Mid(rawData, i, 1)) Then   'if current char is numeric
      strTempString = strTempString & Mid(rawData, i, 1)  'add current char to remporary string
      token = True  'show that current position is within numbers
    ElseIf Mid(rawData, i, 1) = Chr(45) Then  'in string a symbol "-" can appear
      strTempString = "0"
      token = True
    ElseIf Not IsNumeric(Mid(rawData, i, 1)) And token = True Then  'if current char is not numeric and token shows that previous char was number
      retValue(k) = CInt(strTempString)   'convert temporary string to int and write in to the array
      k = k + 1   'go to next array position
      token = False   'switch token to show that current position is not within numbers
      strTempString = ""  'delete stored data from temporary string
    End If
    
  Next
  
  If Len(strTempString) > 0 Then
    retValue(k) = CInt(strTempString) 'if original string ended with numbers, write that numbers to array
  End If
  getClearBeeData = retValue
End Function

Test sub to print data.

Sub printClearBeeData()
  Dim rawData As String
  Dim clearDataArr() As Integer
  Dim i As Integer
  rawData = "season: 1983 colony: 12 colony weight: - kg yeild: 16 kg"
  clearDataArr = getClearBeeData(rawData)
  For i = LBound(clearDataArr) To UBound(clearDataArr) - 1
    Debug.Print clearDataArr(i)
  Next
End Sub

Everything works. Could I do it better? (As I work alone nobody can code review.)
I didn't use regular expressions because I don't know them.

Wasif
  • 14,755
  • 3
  • 14
  • 34
yar
  • 47
  • 6
  • There is a place to code review(when the code works), named like that: https://codereview.stackexchange.com/ so they can help you there for this matter :) – Damian May 06 '20 at 07:49
  • I just noticed the last sentences as you avoided regex on purpose, but I would argue it's a valid and rather easy alternative. – JvdV May 06 '20 at 07:59
  • Wow, thanks. Will go there. – yar May 06 '20 at 08:00
  • JvdV, yes, I feel the power of regex :) but need some practice. – yar May 06 '20 at 08:01
  • I think your question is not necessarily off-topic over here either. So would your string pattern always be the same? And are you interested in those 4 numeric substrings? – JvdV May 06 '20 at 08:03
  • Yep, it's always the same but time to time "-" appears where there were no data from origin. – yar May 06 '20 at 08:09
  • 1
    I recommend to have a look into Regular Expressions and [how to use them in Excel](https://stackoverflow.com/questions/22542834/how-to-use-regular-expressions-regex-in-microsoft-excel-both-in-cell-and-loops). This should easily solve your issue. See [this pattern](https://regex101.com/r/Wui1jo/1). – Pᴇʜ May 06 '20 at 08:17
  • Thanks a lot, apparently it's time to study regexp. "It's a sign" )) – yar May 06 '20 at 08:23

4 Answers4

5

Since your string pattern would always be the same (where one or multiple substrings could be represented by "-"), RegEx could become a simple, easy to implement tool to retrieve these substrings. A very basic example to get you going:

Sub Test()

Dim rawData As String, ClearBeeData as Object

rawData = "season: 1983 colony: 12 colony weight: - kg yeild: 16 kg"
With CreateObject("vbscript.regexp")
    .Global = True
    .Pattern = "\d+|-"
    Set ClearBeeData = .Execute(rawData)
End With

For Each el In ClearBeeData
    Debug.Print Val(el)
Next

End Sub

The link provided by @Peh will give you tons of information, but some small remarks to make here:

  • .Global = True - If this was set to False we can also retrieve the first match. When set to True we can retrieve all matches.
  • .Pattern = "\d+|-" - A very basic pattern to search within the full string where \d is simply short for character class [0-9] and + searches for a substring of at least one character (or longer). The pipe-symbol represents OR so if the position doesn't hold any numbers, we can search for the - alternatively.
  • .Execute(...) - Will return a Matchcollection type object of matches that are found. Note that it would yield an error when no matches can be found. No big deal here when strings have the same pattern, but for future reference you might want to include a .Test before we try to .Execute.
  • Val(el) - Since all elements in the returned collection are text, we can use Val to return them as numbers if you so desire. Cool feature is that Val("-") will return 0. So for above example your results will be:

    1983
    12
    0
    16
    
JvdV
  • 70,606
  • 8
  • 39
  • 70
  • Thanks a lot, it looks much shorter and easier than my approach. I'm going to study this theme: https://stackoverflow.com/questions/22542834/how-to-use-regular-expressions-regex-in-microsoft-excel-both-in-cell-and-loops – yar May 06 '20 at 08:35
  • 2
    Like your additional helpful comments to this valid answer +:). FYI might be interested in my (late) answer, too. – T.M. May 07 '20 at 09:18
3

Test also this (standard VBA) piece of code, please. It is a little shorter and simpler then yours:

Private Function extractIntegers(rowData As String, strDelim As String) As Variant
 Dim arr As Variant, arrInt(3) As Variant, i As Long

 arr = Split(rowData, strDelim)

 For i = 1 To UBound(arr)
    arrInt(i - 1) = val(arr(i))
 Next i
 extractIntegers = arrInt
End Function

A procedure to test it, would be:

Sub testexractIntegers()
  Dim rowData As String, El As Variant, arrInt As Variant
  rowData = "season: 1983 colony: 23 colony weight: 4 kg yeild: 12 kg"
  'rowData = "season: 1983 colony: - colony weight: 4 kg yeild: 12 kg"
  arrInt = extractIntegers(rowData, ": ")
  For Each El In arrInt
    Debug.Print Int(El)
  Next
End Sub

If you uncomment the second rowData definition (containing "-"), the array will return 0 for that specific array element

FaneDuru
  • 38,298
  • 4
  • 19
  • 27
  • Thanks, it works. But only when we based on idea that string always have ":" as delimiter. I just wanted to make a bit universal method. Although I would prefer your solution in my program because it is much clear and simpler. – yar May 06 '20 at 12:01
  • @HuskyTheFur: Would you also have a different delimiter? In such a case, a `strDelim` variable will be created and adapted according to the necessary one. I will adapt the code to accept a variable delimiter... Adapted! – FaneDuru May 06 '20 at 12:05
  • 1
    yep, i thought of variable that fits a delimiter. – yar May 06 '20 at 12:12
  • 1
    FaneDuru, fun alternative: `arrInt(i - 1) = val(arr(i))`....see what happens when you look for strings with a `-` instead of a number =). Upvoted your answer. – JvdV May 06 '20 at 12:18
3

Split combined with Filter allows one loop only

Just for fun and in addition to @FaneDuru 's valid answer, I demonstrate how to use the Split() function combined with Filter() thus avoiding an additional transfer of array items in a help function by a second loop.

Sub TrickySplit()
    Dim rawData As String
    rawData = "season: 1983 colony: 12 colony weight: - kg yeild: 16 kg"

    '[1] split to tokens omitting the 1st element "seasons"
    '    ~> 1983 colony, 12 colony weight, - kg yeild, 16 kg
    Dim words: words = Split(rawData, ": "): words = Filter(words, words(0), False)

    '[2] convert number strings or "-" to values
    Dim i: For i = 0 To UBound(words): words(i) = Val(words(i)): Next

    '[3] optional (display results in VB Editors Immediate Window
    '    ~> 1983,12,0,16
    Debug.Print Join(words, ",")
End Sub

Further hints

ad) [1]: the classical Split() via delimiter : " results in a 0-based array:

   season|1983 colony|12 colony weight|- kg yeild|16 kg 

The first item words(0) ~> season isn't needed and can be removed immediately by the Filter() function resulting in:

   1983 colony, 12 colony weight, - kg yeild, 16 kg

ad) [2]: the Val() function words(i) = Val(words(i)) uses the fact that it ignores following characters even interpreting "-" as 0

(So it's not necessary to execute a theoretical second split via words(i) =Val(Split(words(i), " ")(0)) to remove the appendix-string).

T.M.
  • 9,436
  • 3
  • 33
  • 57
  • 1
    Interesting the array filtering way, but in this case, starting the array elements iteration from 1, the first element would be skipped... Anyhow, it is an interesting idea in case of having the specific element(s) to be excluded in the middle of the array and wanting to avoid `If ... End If`. I will vote the code up... – FaneDuru May 07 '20 at 09:29
  • 1
    Thanks for supporting my answers as you do. Always friendly and constructive answers of your own. Keep up the good work + – JvdV May 07 '20 at 09:59
  • @FaneDuru Thx for feedback. - Responding to your hint to iterate from 1: this would only regard display, but wouldn't change the `words` array itself :-; – T.M. May 07 '20 at 11:23
  • 1
    Agreed. But, in this case, no need to keep the (cleaned) array after doing the job, I would think... :) – FaneDuru May 07 '20 at 12:53
1

A solution which does not rely on regexp but which does require that numbers are always surrounded by spaces.

The function returns a scripting dictionary of Index vs Number where Index is the first character of the number in the raw data string and Number is the numeric string converted to Type Double.

Option Explicit

Sub TestGetClearBeeData()

    Dim mySD As Scripting.Dictionary

    Set mySD = getClearBeeData("season: 1983 colony: - colony weight: 5 kg yeild: 14 kg")

    Dim myItem As Variant
    For Each myItem In mySD

        Debug.Print myItem, mySD.Item(myItem)

    Next

End Sub


Function getClearBeeData(ByVal ipRawData As String) As Scripting.Dictionary

    Dim myItems As Variant
    myItems = Split(Replace(ipRawData, "-", "0"))

    Dim myItem As Variant
    Dim myNumbers As Scripting.Dictionary: Set myNumbers = New Scripting.Dictionary
    Dim myLen As Long: myLen = 1
    For Each myItem In myItems

        If IsNumeric(myItem) Then

            myNumbers.Add myLen, CDbl(myItem)

        End If

        myLen = myLen + Len(myItem) + 1

    Next

    Set getClearBeeData = myNumbers

End Function
freeflow
  • 4,129
  • 3
  • 10
  • 18
  • Hmm, apparently i have to read a bit about "scripting dictionary", don't understand this syntax a bit. – yar May 06 '20 at 12:10