-2

I have an excel sheet with two columns. The first column is the key phrase and the second is the messages. The key phrase may occur in the messages column. I need to know how many times a key phrase has occurred in messages column.

The key phrase is one column and the messages is the second column. The messages column is combination (concatenation) of 1 or more than 1 key phrases. I need to find out that how many key phrases does each message contain. Also some of the messages have some dates and numbers with them. Also some of the messages have dates and numbers in them, the matching key phrase as that date/numbe as (xx-xxx-xxxx) presently.

e.g. the message is "The deal closed on 08-Oct-2014 so no further transaction allowed" and the key phrase is "The deal closed on (xx-xxx-xxxx)". Also there are messages as "Deal number 4238428DDSSD has problems" and the keyphrase is "Deal number xxxxxxxx hass problems". The regex matching is required.

Megan
  • 876
  • 2
  • 10
  • 20
  • 1
    Can you show us what progress you've already made? You might want to have a look at [this post](http://stackoverflow.com/a/22542835/4600127), explaining Regular Expressions in VBA in great detail! – Verzweifler Oct 05 '15 at 11:29

1 Answers1

2

You can pick a few keyword phrases, create the regex pattern for them, then encode the phrases such that a Range.Replace method can be used on them to substitute an appropriate RegEx pattern mask into the keyword phrase.

In the following, I've used X00000000X, XSHORTDATEX and XDEALNMBRX as placeholders within the keywords. These will be replaced with [0-9,-]{7,8}, [0-9,-]{3}[a-z]{3}[0-9,-]{3,5} and [0-9]{7}[a-z]{5} respectively.

X00000000X is designed to handle anything that looks like 1234567 or * 99-11-00*. XSHORTDATEX will handle dates in the dd-mmm-yy or dd-mmm-yyyy format (once converted to lower case) and XDEALNMBRX will locate alphanumeric patterns similar to 4238428DDSSD.

This code requires that the Microsoft VBScript Regular Expression library be added to the VBA project with the VBE's Tools ► References command.

Sub count_strings_inside_strings_rgx()
    Dim rw As Long, lr As Long
    Dim k As Long, p As Long, vKEYs As Variant, vPHRASEs As Variant, vCOUNTs As Variant
    Dim sPATTERN As String, vbaRGX As New RegExp, cMATCHES As MatchCollection

    ReDim vKEYs(0)
    ReDim vPHRASEs(0)
    
    With Worksheets("Sheet1")   '<~~ set to the correct worksheet name\
        'populate the vKEYs array
        For rw = 2 To .Cells(Rows.Count, 1).End(xlUp).Row
            vKEYs(UBound(vKEYs)) = LCase(.Cells(rw, 1).Value2)
            ReDim Preserve vKEYs(UBound(vKEYs) + 1)
        Next rw
        ReDim Preserve vKEYs(UBound(vKEYs) - 1)
        
        'populate the vPHRASEs array
        For rw = 2 To .Cells(Rows.Count, 2).End(xlUp).Row
            vPHRASEs(UBound(vPHRASEs)) = LCase(.Cells(rw, 2).Value2)
            ReDim Preserve vPHRASEs(UBound(vPHRASEs) + 1)
        Next rw
        ReDim Preserve vPHRASEs(UBound(vPHRASEs) - 1)
        ReDim vCOUNTs(0 To UBound(vPHRASEs))
        
        For p = LBound(vPHRASEs) To UBound(vPHRASEs)
            For k = LBound(vKEYs) To UBound(vKEYs)
                sPATTERN = Replace(vKEYs(k), "x00000000x", "[0-9,\-]{7,8}")
                sPATTERN = Replace(sPATTERN, "xshortdatex", "[0-9,\-]{3}[a-z]{3}[0-9,\-]{3,5}")
                sPATTERN = Replace(sPATTERN, "xdealnmbrx", "[0-9]{7}[a-z]{5}")
                sPATTERN = Replace(sPATTERN, "xshortwrapdatex", "\([0-9,\-]{3}[a-z]{3}[0-9,\-]{3,5}\)")
                With vbaRGX
                    .Global = True
                    .Pattern = sPATTERN
                    Set cMATCHES = .Execute(vPHRASEs(p))
                End With
                vCOUNTs(p) = vCOUNTs(p) + cMATCHES.Count
            Next k
        Next p
        
        .Cells(2, 3).Resize(UBound(vCOUNTs) + 1, 1) = Application.Transpose(vCOUNTs)
        
        Call key_in_phrase_helper_rgx(vKEYs, .Range(.Cells(2, 2), .Cells(Rows.Count, 2).End(xlUp)))
        
    End With
    
    Set cMATCHES = Nothing
    Set vbaRGX = Nothing

End Sub

Sub key_in_phrase_helper_rgx(vKYs As Variant, rPHRSs As Range)
    Dim c As Long, m As Long, p As Long, r As Long, v As Long, sPTTRN As String
    Dim vbaRGX As New RegExp, cMATCHES As MatchCollection
    
    With rPHRSs
        For r = 1 To rPHRSs.Rows.Count
            With .Cells(r, 1)
                .ClearFormats
                For v = LBound(vKYs) To UBound(vKYs)
                    sPTTRN = Replace(vKYs(v), "x00000000x", "[0-9,\-]{7,8}")
                    sPTTRN = Replace(sPTTRN, "xshortdatex", "[0-9,\-]{3}[a-z]{3}[0-9,\-]{3,5}")
                    sPTTRN = Replace(sPTTRN, "xdealnmbrx", "[0-9]{7}[a-z]{5}")
                    sPTTRN = Replace(sPTTRN, "xshortwrapdatex", "\([0-9,\-]{2,3}[a-z]{3}[0-9,\-]{3,5}\)")
                    c = 5 + CBool(vKYs(v) <> sPTTRN) * 2
                    Debug.Print sPTTRN
                    With vbaRGX
                        .Global = True
                        .Pattern = sPTTRN
                    End With
                    Set cMATCHES = vbaRGX.Execute(LCase(.Value2))
                    For m = 0 To cMATCHES.Count - 1
                        p = 0
                        Do While CBool(InStr(p + 1, .Value2, cMATCHES.Item(m), vbTextCompare))
                            p = InStr(p + 1, .Value2, cMATCHES.Item(m), vbTextCompare)
                            'Debug.Print vKYs(v)
                            With .Characters(Start:=p, Length:=Len(cMATCHES.Item(m))).Font
                                .Bold = True
                                .ColorIndex = c
                            End With
                        Loop
                    Next m
                Next v
            End With
        Next r
    End With
    
    Set cMATCHES = Nothing
    Set vbaRGX = Nothing

End Sub

In the following image of me sample's results, the staight location items are noted in bold}blue and the RegEx pattern matching is noted by bold|red.

  regex keyword from phrases

Feel free to modify and append with additional keywords, phrases and RegEx patterns.

Community
  • 1
  • 1
  • I am facing some issues while running it. – Megan Oct 06 '15 at 04:19
  • I've added directions for including the required library to the project. If this does not resolve the issue, please provide more detail on what error you are receiving and what line is pointed to on the error. [LINK](https://www.dropbox.com/s/e7n24qv9gbvn4bb/count_strings_in_strings_and_highlight.xlsb?dl=0) to my sample workbook. –  Oct 06 '15 at 04:51
  • Could you please check once, the pattern matching is not working for dates. The string "effective date (30-Sep-2014) may not precede the transaction date" is not getting matched with KeyPhrase "effective date (xx-xxx-xxxx) may not precede the transaction date". I had replaced sPTTRN = Replace(sPTTRN, "xx-xxx-xxxx", "[0-9,\-]{2,3}[A-Z][a-z]{3}[0-9,\-]{3,5}"). – Megan Oct 07 '15 at 05:13
  • I've added another pattern. Your keyword phrase would be **effective date (XSHORTWRAPDATEX) may not precede**. The new pattern needs to go into two places; one for each sub. –  Oct 07 '15 at 05:43
  • For Alphanumeric as well, sPTTRN = Replace(sPTTRN, "xdealnmbrx", "[0-9]{7}[a-z]{5}") seems not matching – Megan Oct 07 '15 at 06:21
  • The alpha numeric pattern was designed to handle 7 digits followed by 5 characters within the exact phrase provided; no more, no less. It was tested and performed well on the samples you provided (see B13 and B16 of the example workbook). –  Oct 07 '15 at 06:58