4

Some similarly-named threads for this, but still couldn't solve my problem. I need to extract a fixed-length NUMBER value from an Excel string (8 digits in my scenario). Following Excel formula was provided for this purpose:

=MID(A1,FIND("--------",SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A1,"0","-"),"1","-"),"2","-"),"3","-"),"4","-"),"5","-"),"6","-"),"7","-"),"8","-"),"9","-")),8)

It does the job, however I have two issues with this:

  1. Most crucially - I'm looking for an exact match. While it does extract the first 8-digit sequence it finds, I'm really after only 8-digit numbers, meaning that 9-digit (or longer) numbers should be ignored (as 7-digit numbers already are). This formula also extracts first 8 digits from a longer number.

  2. Less important, but would be great to only look for numbers starting with 1. So, really just trying to extract this: 1??????? as a numeric value. So something like "a12891212a" or "a 12891212 a" should be extracted, meanwhile 128912120a or 23456789 should not.

If reasonably doable, I'd prefer an Excel formula-based approach compared to VBA. Any help is much appreciated!

braX
  • 11,506
  • 5
  • 20
  • 33
dotsent12
  • 137
  • 3
  • 17

5 Answers5

6

This could be done through formula quite alright, but all depends on your Excel version:

enter image description here


1) Excel 2016, you could still use a formula:

Formula in B1:

=IFERROR(MID(A1,MAX((MID(A1,ROW(A$1:INDEX(A:A,LEN(A1))),1)="1")*(ISNUMBER(--MID(A1,ROW(A$1:INDEX(A:A,LEN(A1))),8)))*(NOT(ISNUMBER(--MID(A1,ROW(A$1:INDEX(A:A,LEN(A1)))+8,1))))*(NOT(ISNUMBER(--MID(A1,ROW(A$1:INDEX(A:A,LEN(A1)))-1,1))))*(ROW(A$1:INDEX(A:A,LEN(A1))))),8),"Nothing found")

Note: This is an array formula and needs to be confirmed through CtrlShiftEnter


2) Excel 2019, using CONCAT() and FILTERXML():

Formula in B1:

=IFERROR(FILTERXML("<t><s>"&CONCAT(IF(ISNUMBER(--MID(A1,ROW(A$1:INDEX(A:A,LEN(A1))),1)),MID(A1,ROW(A$1:INDEX(A:A,LEN(A1))),1),"</s><s>"))&"</s></t>","//s[starts-with(., '1') and string-length(.) =8]"),"Nothing Found")

Note: This is an array formula and needs to be confirmed through CtrlShiftEnter


3) Excel 365, using previous mentioned functions but including SEQUENCE():

Formula in B1:

=IFERROR(FILTERXML("<t><s>"&LET(X,MID(A1,SEQUENCE(LEN(A1)),1),CONCAT(IF(ISNUMBER(--X),X,"</s><s>")))&"</s></t>","//s[starts-with(., '1') and string-length(.) =8]"),"Nothing Found")

The XPATH part of the formulas take care of the actual query, looking for strings that start with a '1' and are of a total length of '8'. This would then even work with strings like 'abc123456789abc12345678abc29876543' returning '12345678'.

If you enjoy FILTERXML and XPATH, then you might find this interesting.


4) Excel 365, insiders edition (time of writing) using TEXTSPLIT():

=LET(X,MID(A1,SEQUENCE(LEN(A1)),1),Y,TEXTSPLIT(A1,IF(ISNUMBER(--X)," ",X),,1),FILTER(Y,(--LEFT(Y)=1)*(LEN(Y)=8),"Nothing Found"))

5) VBA: For if you must use VBA, I guess an UDF is a good option. Something like:

Function GetStr(str As String, pat As String) As String

With CreateObject("vbscript.regexp")
    .Pattern = pat
    .Global = True
    If .Test(str) = True Then
        GetStr = .Execute(str)(0).Submatches(0)
    Else
        GetStr = "Nothing found"
    End If
End With

End Function

You can call this in B1 as per =GetStr(A1,"(?:^|\D)(1\d{7})(?:\D|$)"). This is making use of a regular expression. If you are interested and want to learn more then this is an interesting read for you.

I left the pattern outside the UDF on purpose might you ever want to change it up. The current pattern can be seen in this online Demo, where from left to right the engine will look for:

  • (?: - 1st Non-capturing group
    • ^|\D - Either a start string ancor or anything other than a digit.
    • ) - Close 1st non-capturing group.
  • ( - 1st capture group.
    • 1\d{7} - Search for a literal 1 followed by 7 digits.
    • ) - Close 1st capture group.
  • (?: - 2nd Non-capturing group
    • \D|$ - Either anything other than a digit or an end string ancor.
    • ) - Close 2nd non-capturing group.

enter image description here

JvdV
  • 70,606
  • 8
  • 39
  • 70
  • 1
    It's amazing & instructive for many users how you try to cover nearly any possibility :-) – T.M. Jun 21 '20 at 19:43
  • Big thanks, what great effort and know-how. Unfortunately didn't get it to work as I hoped for. Sorry, missed the proper tag, running Office365 here! Unfortunately don't get any matches from CONCAT&FILTERXML or SEQUENCE option, all end in error. Wonder if I failed with formula replication - e.g. string "AIR LIVON LTD UK8810109920 099920082709286949 INV NO 10628054 1094284" should fetch "10628054", but it produces an error. – dotsent12 Jun 21 '20 at 20:19
  • I do get matches with the 1st option - but again not 100% consistently. E.g. with the example of the same string, it fetches "1094284" instead of the correct one. – dotsent12 Jun 21 '20 at 20:24
  • @dotsent12, If you have Excel 365, then the option I put down works fine on the sample you provided, it yields the correct value on my end. Can you show me exactly how you typed your formula? And what language you using? Also, what exact error you getting? – JvdV Jun 22 '20 at 05:44
  • @dotsent12. Any progress? – JvdV Jul 04 '20 at 08:57
  • Sorry for the delay, I had a bit time off. Gave it another go and all the formula options started to work, hence accepted the answer, big thanks! Initially I thought that the problem was with the new array formulas (FILTER, SEQUENCE) that I don't think work all that great in a Table range, but after the second try, works fine and not producing the IFERROR value, as initially. – dotsent12 Jul 13 '20 at 21:41
  • Full disclosure - having experienced the issues at first, I re-considered and opted against extracting values from the string, but rather used XLOOKUP with wildcards and unprocessed strings. I think the end result was better and perhaps easier to grasp as well. Although don't think I'd made it there without tangling around with this option here. But this will prove useful sooner or later. – dotsent12 Jul 13 '20 at 21:44
3

Here is a simple User Defined Function that looks for sub-strings that are numerals. It creates an array of the sub-strings. It then looks for an element of that array that has length 8 and a leading character of 1:

Option Explicit

Public Function NineD(s As String) As String
    Dim L As Long, temp As String, wf As WorksheetFunction
    Dim i As Long, arr, a
    
    Set wf = Application.WorksheetFunction
    temp = s
    L = Len(s)
    
    For i = 1 To L
        If Mid(temp, i, 1) Like "[0-9]" Then
        Else
            temp = wf.Replace(temp, i, 1, " ")
        End If
    Next i
    
    arr = Split(wf.Trim(temp), " ")
    For Each a In arr
        If Len(a) = 8 And Left(a, 1) = "1" Then
            NineD = a
            Exit Function
        End If
    Next a
End Function

enter image description here

User Defined Functions (UDFs) are very easy to install and use:

  1. ALT-F11 brings up the VBE window
  2. ALT-I ALT-M opens a fresh module
  3. paste the stuff in and close the VBE window

If you save the workbook, the UDF will be saved with it. If you are using a version of Excel later then 2003, you must save the file as .xlsm rather than .xlsx

To remove the UDF:

  1. bring up the VBE window as above
  2. clear the code out
  3. close the VBE window

To use the UDF from Excel:

=NineD(A1)

To learn more about macros in general, see:

http://www.mvps.org/dmcritchie/excel/getstarted.htm

and

http://msdn.microsoft.com/en-us/library/ee814735(v=office.14).aspx

and for specifics on UDFs, see:

http://www.cpearson.com/excel/WritingFunctionsInVBA.aspx

Macros must be enabled for this to work!

Gary's Student
  • 95,722
  • 10
  • 59
  • 99
1

Here's formula based approach assuming data is in cell A2 which shall work in principle

=IFERROR(LOOKUP(2,1/( MID(A2,ROW($A$1:INDEX(A:A,LEN(A2))),8)* (LEN(MID(A2,ROW($A$1:INDEX(A:A,LEN(A2))),8))=8)* (NOT(ISNUMBER(MID("a"&A2,ROW($A$1:INDEX(A:A,LEN(A2))),1)+0)))* (NOT(ISNUMBER(RIGHT(MID(A2,ROW($A$1:INDEX(A:A,LEN(A2))),9))+0)))* (MID(A2,ROW($A$1:INDEX(A:A,LEN(A2))),1)="1")),MID(A2,ROW($A$1:INDEX(A:A,LEN(A2))),8)),"")

It is lengthy as it is built primarily to generate 1 string and 4 tests.

MID(A2,ROW($A$1:INDEX(A:A,LEN(A2))),8) generates 8 character string.

(LEN(MID(A2,ROW($A$1:INDEX(A:A,LEN(A2))),8))=8) tests if there are 8 characters

(NOT(ISNUMBER(MID("a"&A2,ROW($A$1:INDEX(A:A,LEN(A2))),1)+0))) checks previous character is not a digit

(NOT(ISNUMBER(RIGHT(MID(A2,ROW($A$1:INDEX(A:A,LEN(A2))),9))+0))) checks character next to the string being tested is not a digit

(MID(A2,ROW($A$1:INDEX(A:A,LEN(A2))),1)="1") checks that first digit is one.

So if last 4 tests are true then multiplication produces 8 digit number which is used to calculate reciprocal which is checked using LOOKUP which is used to return the found result.

However, I will not recommend it from maintenance perspective.

shrivallabha.redij
  • 5,832
  • 1
  • 12
  • 27
  • Thanks, it actually does function, but similarly to JvdV-s 1st option, it does not recognize some of it correctly. Some examples: "10626872 BLUE RASPBERRY OU 22"; "INVO 10627724 OU TSETSEC" are unrecognized. edit: I'm probably spoiled by INDEX/MATCH & now XLOOKUP. In fact my real formula was RIGHT from the actual string. Seeing the solution containing LOOKUP, it does make a difference, right? – dotsent12 Jun 21 '20 at 20:33
  • @dotsent12, both are recognized at my end so I am confused by your response. – shrivallabha.redij Jun 22 '20 at 05:48
1

A simple VBA alternative via Split() function

Note that the Split() function will be splitted here to two parts only by its (optional) last argument 2; by default you'd get unlimited split parts.

As valid number sequences must start by "1", this digit reveals to be our delimiter for the splitting action.

Joining delimiter "1" to the sample string to be splitted allows referencing parts(1) without error, even if there's no delimiter ("1") at all.

    parts = Split(sample & "1", "1", 2)    ' arguments: string, delimiter, {limit}
Function EightDigits(sample, Optional chk As String = "N/A!") As String
Dim parts: parts = Split(sample & "1", "1", 2)   ' limit to 2 split parts only!
If Not Right(parts(0), 1) Like "#" Then          ' other preceding digit means no first "1"
    EightDigits = chk                            ' temporary string value "N/A!"
    chk = "1" & CStr(Val(parts(1)))              ' Val() cuts following non digits
End If
If Len(chk) = 8 Then EightDigits = chk           ' return only 8-digit numbers
End Function
T.M.
  • 9,436
  • 3
  • 33
  • 57
  • Looks nifty but it won't work on the more extended samples provided by OP. Something he unfortunately only mentioned in comments. It works for the samples in the actual question though. =) – JvdV Jun 22 '20 at 05:56
1

Another option using CONCAT+FILTERXML function

In B2, formula copied down :

=IFERROR(FILTERXML("<a><b>"&CONCAT(INDEX(TEXT(MID(SUBSTITUTE(A2," ","A"),ROW(A$1:INDEX(A:A,LEN(A2))),1),"0;;0;""</b><b>"""),0))&"</b></a>","//b[.>9999999][.<20000000]"),"not found")

enter image description here

bosco_yip
  • 3,762
  • 2
  • 5
  • 10