4

I have a bunch of data which contains any number of 5-digit strings in completely inconsistent formats, and i want to extract these 5-digit strings (in bold) out. I am not bothered about strings containing less than or more than 5-digits. as an example, this is the kind of data i have in my file

Cell A1: "1. 76589 - wholesale activities. 2. 33476 - general"

Cell A2: "WHOLESALE ACTIVITIES (76589). SHIPPING (12235). REAL ESTATE ACTIVITIES (67333)"

Cell A3: "1. 33476 General. 658709 annual road. Unknown 563"

I've tried the usual SEARCH/FIND, MIN, LEFT/RIGHT/MID functions, but am not sure how to get them to produce the result i need, and even text-to-columns wasn't giving me a clean result

thanks in advance

vrgl
  • 43
  • 1
  • 4
  • 1
    Cell A1 contains two 5 digit number. How do you want your output to be? – Gowtham Shiva Apr 27 '17 at 10:44
  • hi gowtham shiva, ideally, i would like each of the 5-digit strings to be extracted to the subsequent columns. in the example above, this would mean B1: "76589", C1: "33476" – vrgl Apr 27 '17 at 16:12
  • You would need regex with vba. try some of the answers. Add excel-vba and regex tags in your question – Gowtham Shiva Apr 27 '17 at 16:16

3 Answers3

2

Here is a macro that will split your line into the columns as you requested.

The range being processed is whatever you have selected. The results are written into the adjacent columns on the same row.

Depending on your worksheet setup, you may want to "clear out" the rows where the results are going before executing the extraction code.

You can also write code to select the data to be processed automatically. Plenty of examples on this forum.


Option Explicit
Sub Extract5Digits()
    Dim R As Range, C As Range
    Dim RE As Object, MC As Object, M As Object
    Dim I As Long

Set R = Selection
Set RE = CreateObject("vbscript.regexp")
With RE
    .Global = True
    .Pattern = "\b\d{5}\b"
    For Each C In R
        If .test(C.Text) = True Then
            I = 0
            Set MC = .Execute(C.Text)
            For Each M In MC
                I = I + 1
                C.Offset(0, I) = M
            Next M
        End If
    Next C
End With
End Sub

enter image description here

Ron Rosenfeld
  • 53,870
  • 7
  • 28
  • 60
  • hi ron, thank you very much. this worked exactly according to my requirements. it also showed me that i should have added a few disclaimers in my original question - that i really suck at visual basic, so i needed very specific, step-by-step instructions. your answer provided all of that necessary for a beginner like me to make it work – vrgl May 02 '17 at 09:51
0

Simply with Excel functions this is impossibile.

The best way for you is to use the Regex 55 library in VBA.

Let's consider this example:

+---+--------------------------------------------------------------+
|   |                              A                               |
+---+--------------------------------------------------------------+
| 1 | Cell A3: "1. 33476 General. 658709 annual road. Unknown 563" |
| 2 | 33476                                                        |
+---+--------------------------------------------------------------+

From the Excel file hit Alt + F11, then go to Tools => Reference and select "Microsoft VBScript Regular Expression 5.5".

Then you can use the following function definition:

Public Function Get5DigitsNumer(search_str As String)
Dim regEx As New VBScript_RegExp_55.RegExp
Dim matches
    GetStringInParens = ""
    regEx.Pattern = "[0-9]{5}"
    regEx.Global = True
    If regEx.test(search_str) Then
        Set matches = regEx.Execute(search_str)
        GetStringInParens = matches(0).SubMatches(0)
    End If
End Function

At this time you can use the following code:

Sub PatternExtractor()
    Range("A2").Value = Get5DigitsNumer(Range("A1"))
End Sub

which take the value of cell A1 and extract the 5 digits numer, thn the result is saved into cell A2.

At the time I don't have any idea how this code could work where the same cell contains more than one time; like "Cell A1: "1. 76589 - wholesale activities. 2. 33476 - general" in your example.

I suggest you to have a look at this answer. The pattern is different but the question is really similar to yours.

Community
  • 1
  • 1
Nicolaesse
  • 2,554
  • 12
  • 46
  • 71
  • Since the OP wanted to have **exactly** five digits, I suggest you add the word boundary tokens to your regex pattern – Ron Rosenfeld Apr 27 '17 at 16:12
  • hi Nicolaesse, thank you very much for the detailed reply. ill give your sugges. tion a shot once i get back to my working file hi Ron Rosenfeld, thank you for the suggestion. my knowledge of vba is rather rudimentary. where should i add the word boundary tokens in Nicolaesse's code? – vrgl Apr 27 '17 at 16:16
  • hi nicolaesse, thank you again for your reply. unfortunately, i couldnt get your code to work. im 99% sure it is because my lack of knowledge in this area is causing me to implement it wrongly though – vrgl May 02 '17 at 09:52
0

The only way that you can do it is by writing a regex in VBA. I would recommend you to look at this question.

Community
  • 1
  • 1
Gorkem
  • 35
  • 1
  • 6