0

I am trying to capture all the product codes in the text. They sometimes are multiple. I am able to capture just one. I guess its more of the excel vba question rather than the regular expression as I have the str pattern.

 Sub regexp()

Dim regEx As New regexp
Dim strPattern As String
Dim Myrange As Range
Dim LastRow As Integer


LastRow = ActiveSheet.Cells(Rows.Count, "W").End(xlUp).Row
Set Myrange = ActiveSheet.Range("W4:W" & LastRow)

For Each c In Myrange


strPattern = "(?:\s[ABCDabcd][0-9][A-Za-z0-9]{3}\s|\s[ABCDabcd][0-9oO][0-9oO)][0-9][0-9A-Za-z]\s|\s[ABCDabcd][0-9oO][0-9A-Za-z)][0-9A-Za-z][0-9]\s|[^A-Za-z0-9][ABCDabcd]\s[0-9][A-Z0-9a-z]{3}\s)"
    If strPattern <> "" Then
        With regEx
            .Global = True
            .MultiLine = True
            .IgnoreCase = False
            .Pattern = strPattern
        End With

        Set matches = regEx.Execute(c.Value)
        On Error Resume Next
        c.Offset(0, 7).Value = matches.Item(0)
    End If
Next c
End Sub

How do I change the code to capture distinct multiple string pattern in the text. This code currently only captures the first one. I want to capture all seperated by a comma. Eg- First I had problems with the product b0067. Then issues were resolved. Later I had issues with this D0887 . This was followed by C689W, D 7890 OUTPUT- b0067,D0887,C689W,D 7890 I want to trim these as the spaces were considered at the end or beginning.

Community
  • 1
  • 1
viji
  • 425
  • 2
  • 6
  • 16
  • You just use the first item - `matches.Item(0)`. Iterate over the collections and output the data where you want. Where do you want to output the matches? However, I also think your pattern is very slow, since your alternatives are not optimized. You may also fail to extract 2 values from `" A0ABC B0ABC "` string. Do you really need to match a whitespace at the end, or just check if there is a whitespace? – Wiktor Stribiżew May 15 '17 at 19:33
  • I am right now outputing after 7 columns in AD column by using c.offset(0,7). The data is messy so wanted to ideally capture 5 characters but the best way I thought was \s but I do miss if they end with full stop"." – viji May 15 '17 at 19:46

1 Answers1

0

First, I suggest enhancing the regex to

strPattern = "(?:\s[abcd](?:[0-9][a-z0-9]{3}|[0-9o][0-9o)][0-9][0-9a-z]|[0-9o][0-9a-z)][0-9a-z][0-9])|[^a-z0-9][abcd]\s[0-9][0-9a-z]{3})(?!\S)"

See the regex demo. Make sure you set .IgnoreCase = True. This regex will check if there is a whitespace or end of string after your match with a (?!\S) lookahead, so A0ABC and B0ABC will be extracted from "1 A0ABC B0ABC 3". The first \s can be changed to (^|\s) to also match the strings at the start of a string.

Then, after running Execute(), iterate over the matches:

 Set matches = regEx.Execute(c.Value)
 For Each m In matches
    Cells(x, y) = m.Value
 Next

This is a whole fixed sub that will print the matches as a comma separated value to the cell you specified:

Dim regEx As New regexp
Dim strPattern As String
Dim Myrange As Range
Dim LastRow As Long, cnt As Long

LastRow = ActiveSheet.Cells(Rows.Count, "W").End(xlUp).Row
Set Myrange = ActiveSheet.Range("W4:W" & LastRow)
strPattern = "(?:\s[abcd](?:[0-9][a-z0-9]{3}|[0-9o][0-9o)][0-9][0-9a-z]|[0-9o][0-9a-z)][0-9a-z][0-9])|[^a-z0-9][abcd]\s[0-9][0-9a-z]{3})(?!\S)"

If strPattern <> "" Then
  With regEx
    .Global = True
    .IgnoreCase = True
    .Pattern = strPattern
  End With
  cnt = 0
  For Each c In Myrange
    Set matches = regEx.Execute(c.Value)
    For Each m In matches
        c.Offset(0, 7).Value = c.offset(0, 7).Value + m.Value
        cnt = cnt + 1
        If cnt < matches.Count Then c.offset(0, 7) = c.offset(0, 7) & ","
    Next
  Next c
End If

Note the place where the RegExp is initialized - outside the loop.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • I reverted the link to the demo (with a PCRE settig) so that formatting could be ignored. – Wiktor Stribiżew May 15 '17 at 20:07
  • y should be 30(AD column) what should my x value be and how should I change x value – viji May 16 '17 at 02:53
  • @viji: I do not see a problem with printing the values. You may use a counter and increment it every time a new value is retrieved from the collection. And then either increment row or column value. – Wiktor Stribiżew May 16 '17 at 06:51
  • Sorry I am not understanding. I would like to see the results in column AD for each corresponding string seperated by a comma. In my above example W is where my string is and I want to see multiple values if they occur in the string to be in AD where they are separated by comma – viji May 16 '17 at 20:07
  • @viji: See the full fixed sub. – Wiktor Stribiżew May 16 '17 at 20:30
  • I changed my regular exp- to this- (?: (?!\[^a-z0-9])[abcd](?:[0-9][a-z0-9]{3}|[0-9o][0-9o)][0-9][0-9a-z]|[0-9o][0-9a-z)][0-9a-z][0-9]) | [^a-z0-9][abcd]\s[0-9][0-9a-z]{3} )(?!\[^a-z0-9]) With this eg: codes B124B and B124C are more common. Thus this B124C is hard Product codes are B0751, B0752, B0756, C0757, C0761 or B0762 When I executed this in demo site it works like a charm but in my macro. I am getting error at -set matches=. The other issue is sometimes I am not getting comma between and sometimes getting. I also wanted distinct product codes only. – viji May 17 '17 at 20:43
  • @viji: I do not understand: you changed the regex to something else and want the same behavior? Why add negative lookaheads? What does your new regex do/what are the requirements? – Wiktor Stribiżew May 17 '17 at 21:08
  • 1st line:codes B124B and B124C more commonThus this B124C is hard 2nd line:Product codes are B0751, B0752, B0756, C0757, C0761 or B0762 ---Output1st - B124B, B124C Output2nd B0751, B0752, B0756, C0757, C0761, B0762. But my first regular expression does not capture all of these. So I am trying this not sure of complicated negative lookahead – viji May 18 '17 at 01:59
  • All I am saying is if there is a comma after the product code it should recognise and extract the remaining eg:"Product codes are B0751, B0752, B0756, C0757, C0761 or B0762". Output I am expecting is- B0751, B0752, B0756, C0757, C0761, B0762 But the present reg ex only picks C0761 as there is no comma. other thing is I want to pick only unique codes from a string/record in the vba code – viji May 19 '17 at 14:18
  • Please add all details, examples, new requirements, to the **question** and remove the comments. It is impossible to see the formatting in comments. – Wiktor Stribiżew May 19 '17 at 14:21
  • @viji: Try [this regex](https://regex101.com/r/ofXVhI/3). Dupe entries can be removed using [VBA code](https://stackoverflow.com/questions/11870095/vba-remove-duplicates-from-array). – Wiktor Stribiżew May 22 '17 at 13:24