2

I have an excel sheet with two columns. The first column is the key phrase and the second is the messages. The key phrase may occur in the messages column. I need to know how many times a key phrase has occurred in messages column. please suggest some good and easy way of finding.

The key phrase is one column and the messages is the second column. The messages column is combination (concatenation) of 1 or more than 1 key phrases. I need to find out that how many key phrases does each message contain.

Megan
  • 876
  • 2
  • 10
  • 20
  • 1
    If the Excel data is the result of an SQL query, I'd recommend sculpting the T-SQL with a subquery or partition that delivers the results along with the rest of the data. –  Sep 30 '15 at 08:05
  • The messages column can have more than one key phrase that is present in the first column. E,g, if the key phrase is say " this is to be tested" , then this string can occur in messages column as substring in many rows. e.g "hello world; this is to be tested; and this also is being tested;". this sort of values are available in column B. – Megan Sep 30 '15 at 11:59
  • Yes, that is what a sub-query does in a SELECT statement. –  Sep 30 '15 at 12:18

2 Answers2

4

It's possible that you might be able to collect an efficient count with a module sub procedure that performs all of the maths in memory arrays¹ and returns the counts to the worksheet.

      Counts keywords in phrases sample data

I've used some standard Lorem Ipsum keywords and phrases to create the above sample data.

Tap Alt+F11 and when the VBE opens, immediately use the pull-down menus to Insert ► Module (Alt+I,M). Paste the following into the new module code sheet titled something like Book1 - Module1 (Code).

Option Explicit

Sub count_strings_inside_strings()
    Dim rw As Long, lr As Long
    Dim k As Long, p As Long, vKEYs As Variant, vPHRASEs As Variant, vCOUNTs As Variant

    ReDim vKEYs(0)
    ReDim vPHRASEs(0)

    With Worksheets("Sheet1")   '<~~ set to the correct worksheet name\
        'populate the vKEYs array
        For rw = 2 To .Cells(Rows.Count, 1).End(xlUp).Row
            vKEYs(UBound(vKEYs)) = LCase(.Cells(rw, 1).Value2)
            ReDim Preserve vKEYs(UBound(vKEYs) + 1)
        Next rw
        ReDim Preserve vKEYs(UBound(vKEYs) - 1)

        'populate the vPHRASEs array
        For rw = 2 To .Cells(Rows.Count, 2).End(xlUp).Row
            vPHRASEs(UBound(vPHRASEs)) = LCase(.Cells(rw, 2).Value2)
            ReDim Preserve vPHRASEs(UBound(vPHRASEs) + 1)
        Next rw
        ReDim Preserve vPHRASEs(UBound(vPHRASEs) - 1)
        ReDim vCOUNTs(0 To UBound(vPHRASEs))

        'perform the counts
        For p = LBound(vPHRASEs) To UBound(vPHRASEs)
            For k = LBound(vKEYs) To UBound(vKEYs)
                vCOUNTs(p) = CInt(vCOUNTs(p)) + _
                    (Len(vPHRASEs(p)) - Len(Replace(vPHRASEs(p), vKEYs(k), vbNullString))) / Len(vKEYs(k))
            Next k
        Next p

        'return the counts to the worksheet
        .Cells(2, 3).Resize(UBound(vCOUNTs) + 1, 1) = Application.Transpose(vCOUNTs)

        'run the helper procedure to Blue|Bold all of the found keywords within the phrases
        Call key_in_phrase_helper(vKEYs, .Range(.Cells(2, 2), .Cells(Rows.Count, 2).End(xlUp)))

    End With
End Sub

Sub key_in_phrase_helper(vKYs As Variant, rPHRSs As Range)
    Dim p As Long, r As Long, v As Long

    With rPHRSs
        For r = 1 To rPHRSs.Rows.Count
            .Cells(r, 1) = .Cells(r, 1).Value2
            For v = LBound(vKYs) To UBound(vKYs)
                p = 0
                Do While CBool(InStr(p + 1, .Cells(r, 1).Value2, vKYs(v), vbTextCompare))
                    p = InStr(p + 1, .Cells(r, 1).Value2, vKYs(v), vbTextCompare)
                    Debug.Print vKYs(v)
                    With .Cells(r, 1).Characters(Start:=p, Length:=Len(vKYs(v))).Font
                        .Bold = True
                        .ColorIndex = 5
                    End With
                Loop
            Next v
        Next r
    End With
End Sub

You may have to rename the worksheet to be processed in the 5th code line. I've also included a helper routine that identifies the key words within the phrases with a Blue|Bold font. Comment out or delete the Call key_in_phrase_helper(...) line at the bottom of the first sub procedure if this is not desired.

Tap Alt+Q to return to your worksheet. Tap Alt+F8 to open the Macros dialog and Run the sub procedure. If you data resembles the sample data I've put together then you should have similar results.

      Counts keys in phrases


¹ These are some advanced methods but I feel that they are also the best way to tackle your problem. If you have specific questions that your own research does not adequately explain, I will try to address them in the Comments section. The sample workbook I created to create this solution can be made available on request.

  • This is awesome. Thanks a lot for the reply. It works perfectly. – Megan Oct 01 '15 at 03:41
  • That's great to hear. I had to make some assumptions during the process but it seems that I guessed right. –  Oct 01 '15 at 03:44
  • One more help required. Some of the key phrases have some dates and numbers, which in the original message may vary but the rest of the message remains same. e.g the key phrase is "Id creation xx-xx-xx fails" and the message is "Id creation 9823023 fails". Could you please help me in this that it ignores the number while matching. – Megan Oct 01 '15 at 04:17
  • Adding that functionality would mean involving RegEx (Regular Expression) and some fuzzy logic. In short, that would break this effort and require a complete rewrite. Given that a lack of sample keywords and phrases led to this shortcoming, why not [close off this question](http://stackoverflow.com/help/someone-answers) and initiate another where you can outline the new requirements. You might even drop off a quick comment here regarding the new question as I have some ideas on how to proceed. –  Oct 01 '15 at 11:04
  • I can not post another question for next one week :( – Megan Oct 02 '15 at 01:52
  • Sorry to say, but that might have something to do with your posting history. Typically, you seem to abandon a question after receiving an answer. Accepting one of the replies as your 'Accepted Answer' by clicking the checkmark found in the left-hand alley beside the best answer and marking one or more as 'Helpful' by clicking the ▲ helps build both your site reputation and the people answering your inquiries. See [What should I do when someone answers my question?](http://stackoverflow.com/help/someone-answers). More reputation means that you can post more questions. –  Oct 02 '15 at 02:18
  • I have raised another question. This is the link. please check: https://stackoverflow.com/questions/32946884/pattern-matching-in-excel – Megan Oct 05 '15 at 10:56
  • See my reply involving RegEx pattern matching [here](https://stackoverflow.com/questions/32946884/pattern-matching-in-excel/32960315#32960315). –  Oct 06 '15 at 04:16
0

You can use this formula COUNTIF(B:B;"*"&A2&"*") starting from the second row.

Dzmitry Paliakou
  • 1,587
  • 19
  • 27
  • Could you please help me understand this.sorry i am not a frequent user of excel. – Megan Sep 30 '15 at 11:59
  • `B:B` is a refering to the entire column B, where you have the messages. The formula will look over it. `A2` is a refering to the cell in the second row of the column A where you have the key. Asterisk(`*`) is a wildcard for searching. Ampersand (&) is an operator for concatenating strings. F. e., if you have a key "excel" in the column A and a message "Excel 2016 is compatible with Windows 10" in the column B, you need to build search pattern `*excel*`. That's why you use the string parts `"*"` in the start and the end of your formula parameter. And you concatenate it with the key using `&`. – Dzmitry Paliakou Sep 30 '15 at 13:37
  • Thanks, But sorry, I have one doubt. Will this be comparing only one element that is A2 with entire column B:B?. I need to compare all values of column A with all values of coulmn B. – Megan Sep 30 '15 at 23:31
  • @Megan, try insert the formula `=(SUMPRODUCT(LEN(B:B))-SUMPRODUCT(LEN(SUBSTITUTE(B:B;A2;""))))/LEN(A2)` into the cell C2 and then expand it down – Dzmitry Paliakou Oct 01 '15 at 12:16