0

I have the following Text sample:

Ins-Si_079_GM_SOC_US_VI SI_SOC_FY1920_US_FY19/20_A2554_Si Resp_2_May

I want to get the number 079, So what I need is the first instance of digits of length 3. There are certain times the 3 digits are at the end, but they usually found with the first 2 underscores. I only want the digits with length three (079) and not 19, 1920, or 2554 which are different lengths.

Sometimes it can look like this with no underscore:

1920 O-B CLI 353 Tar Traf

Or like this with the 3 digit number at the end:

Ins-Si_GM_SOC_US_VI SI_SOC_FY1920_US_FY19/20_A2554_Si Resp_2_079

There are also times where what I need is 2 digits but when it's 2 digits its always at the end like this:

FY1920-Or-OLV-B-45 

How would I get what I need in all cases?

atgold18
  • 31
  • 6
  • So, are you able to set a rule able to fit all the possibilities? What do you need to extract from the string "1920_US_FY19/20_A2554_Si Resp_2_May"? – FaneDuru Jul 22 '20 at 14:04
  • @FaneDuru Sorry is wasnt clear b4, that was part of the same string I edited it. – atgold18 Jul 22 '20 at 14:07
  • Which digit would you want from this: `1920 O-B CLI 353 Tar Traf`. There are 2 sets of digits there that meet your criteria – Zac Jul 22 '20 at 14:16
  • @Zac I would want 353 with the length of 3 digits – atgold18 Jul 22 '20 at 14:17
  • How do you identify that you have to execute a 2-digits search only? Will this be if your tokens are connected by "-" instead of underscores and blanks pointing to the 3-digit alternative @atgold18? – T.M. Jul 22 '20 at 14:45
  • @T.M. I would execute a 2 digit search when there are no 3 digit numbers before the end part and the last 2 digits are 2. if 3 digits are fount at end then get 3 but if not then get 2. there are times when last is a number but only one number. I would only want to get last if there are 2 or 3 numbers. The "-" would not be relevant to the 2 digits. if nothing is found that is desired then would return " " – atgold18 Jul 22 '20 at 14:58
  • 3
    Offtopic: You have some outstanding open questions. You may want to revisit them and if possible reward those who took time to answer your questions with upvotes and you may even consider to accept a working answer to close the thread. No obligations but it's simply how this site [works](https://stackoverflow.com/tour) – JvdV Jul 22 '20 at 15:50

3 Answers3

3

You can split the listed items and check for 3 digits via Like:

Function Get3Digits(s As String) As String
Dim tmp, elem
tmp = Split(Replace(Replace(s, "-", " "), "_", " "), " ")
For Each elem In tmp
    If elem Like "###" Then Get3Digits = elem: Exit Function
Next
If Get3Digits = vbNullString Then Get3Digits = IIf(Right(s, 2) Like "##", Right(s, 2), "")

End Function

Edited due to comment:

I would execute a 2 digit search when there are no 3 didget numbers before the end part and the last 2 digits are 2. if 3 digits are fount at end then get 3 but if not then get 2. there are times when last is a number but only one number. I would only want to get last if there are 2 or 3 numbers. The - would not be relevant to the 2 digets. if nothing is found that is desired then would return " ".

T.M.
  • 9,436
  • 3
  • 33
  • 57
  • The approach above assumes that you are wanting the *2-digit* alternative in case of a string connected by "-" delimiters only. It's no problem, however to change the condition and the delimiter replacements to include "-" as well, but I need an exact description of the applied logic. – T.M. Jul 22 '20 at 14:55
  • why it was assumed that I want to get 2 digits when separated by "-"?. it just happened to be that way for that example. It could be that is something to look at, however, In my description, I say that when the last 2 digits are 2 then we want those last 2 digits. so if there are no 3 digits anywhere and the last 2 digits are 2 to I would want to extract the 2 digits. – atgold18 Jul 22 '20 at 15:08
  • 1
    @atgold18 (1) yes, you can use the hyphen delimiters, too. (2) changed code corresponding to your comment; hopefully it'll be helpful :-) – T.M. Jul 22 '20 at 15:13
3

If VBA is not a must you could try:

=TEXT(INDEX(FILTERXML("<t><s>"&SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A1,"_"," "),"-"," ")," ","</s><s>")&"</s></t>","//s[.*0=0][string-length()=3 or (position()=last() and string-length()=2)]"),1),"000")

enter image description here

It worked for your sample data.


Edit: Some explaination.

  • SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A1,"_"," "),"-"," ")," ","</s><s>") - The key part to transform all three potential delimiters (hyphen, underscore and space) to valid XML node end- and startconstruct.

  • The above concatenated using ampersand into a valid XML construct (adding a parent node <t>).

  • FILTERXML can be used to now 'split' the string into an array.

  • //s[.*0=0][string-length()=3 or last() and string-length()=2] - The 2nd parameter of FILTERXML which should be valid XPATH syntax. It reads:

     //s                                                                 'Select all <s> nodes with
                                                                          following conditions:
     [.*0=0]                                                             'Check if an <s> node times zero
                                                                          returns zero (to check if a node
                                                                          is numeric.                              '
     [string-length()=3 or (position()=last() and string-length()=2)]    'Check if a node is 3 characters
                                                                          long OR if it's the last node and
                                                                          only 2 characters long.
    
  • INDEX(.....,1) - I mentioned in the comments that usually this is not needed, but since ExcelO365 might spill the returned array, we may as well implemented to prevent spilling errors for those who use the newest Excel version. Now we just retrieving the very first element of whatever array FILTERXML returns.

  • TEXT(....,"000") - Excel will try delete leading zeros of a numeric value so we use TEXT() to turn it into a string value of three digits.

Now, if no element can be found, this will return an error however a simple IFERROR could fix this.

JvdV
  • 70,606
  • 8
  • 39
  • 70
  • It's unclear for me under what condition a 2-digit search has to be executed as for this case OP shows only tokens connected by "-" instead of the underscore or space delimiters. – T.M. Jul 22 '20 at 14:44
  • @T.M. This would use any of the three possible delimiters (Space, underscore or hyphen). Then it would check for any three digit number *or* the last element in the array if it happens to be a 2-digit number. Now the first element in the returned array is converted to a string of three digits (unless one got Excel365 in which case it could be handy to include `INDEX`). Hopefully that clears things up? – JvdV Jul 22 '20 at 14:50
  • 1
    Formula is way beyond my capabilities but it does produce results :). Not sure if OP wants a formula based solution but this for me definitely would be the first option – Zac Jul 22 '20 at 14:59
  • @Zac, if it's a pré then I don't mind explaining it a bit more in the answer itself. – JvdV Jul 22 '20 at 15:05
  • Explanation would be brilliant if you can – Zac Jul 22 '20 at 15:17
  • 2
    @Zac, see edit. If you are interested in the abilities of FILTERXML you may find [this](https://stackoverflow.com/q/61837696/9758194) interesting. – JvdV Jul 22 '20 at 15:45
  • @JvdV and @RonRosenfeld reveal to be the masters of the `FILTERXML()` function :+). – T.M. Jul 22 '20 at 15:47
  • Never even considered looking at XML to get the results. Thanks for the explanation to a very intuitive approach – Zac Jul 22 '20 at 15:47
2

Try this function, please:

Function ExtractThreeDigitsNumber(x As String) As String
   Dim El As Variant, arr As Variant, strFound As String
   
   If InStr(x, "_") > 0 Then
        arr = Split(x, "_")
   Elseif InStr(x, "-") > 0 Then
        arr = Split(x, "-")
   Else
        arr = Split(x, " ")
   End If
    For Each El In arr
        If IsNumeric(El) And Len(El) = 3 Then strFound = El: Exit For
    Next
    If strFound = "" Then
        If IsNumeric(Right(x, 2)) Then ExtractThreeDigitsNumber = Right(x, 2)
    Else
        ExtractThreeDigitsNumber = strFound
    End If
End Function

It can be called in this way:

Sub testExtractThreDig()
    Dim x As String
    x = "Ins-Si_079_GM_SOC_US_VI SI_SOC_FY1920_US_FY19/20_A2554_Si Resp_2_May"
    Debug.Print ExtractThreeDigitsNumber(x)
End Sub
FaneDuru
  • 38,298
  • 4
  • 19
  • 27