-1

Sorry if this is a stupid question but i've been racking my brain for a couple of days now and i can't seem to come up with a solution to this.

I have a list of phrases and a list of keywords that need to be searched, extracted and replaced.

For example i have the following list of keywords in sheet 1 column A that need to be extracted and replaced with the keywords in column B.

red      -     orange

blue     -     violet

green    -     pink

yellow   -     brown

And in sheet 2 I have a list of phrases in column A.

The girl with blue eyes had a red scarf.

I saw a yellow flower.

My cousin has a red car with blue rims and green mirrors.

And I want to extract in column B the keywords that are matched for every phrase in the exact order that they appear like so:

COLUMN A                                                        COLUMN B

The girl with blue eyes had a red scarf.                        violet, orange

I saw a yellow flower.                                          brown

My cousin has a red car with blue rims and green mirrors.       orange, violet, pink

Is there any way this can be achieved either by formula or VBA? Also this needs to be usable with Excel 2016 so i can't use fancy functions like "TEXTJOIN".

Thank you everyone in advance!

Cheers!

L.E.

I was able to find some code that almost does what I need it to do but it does not keep the correct order.

Is there anyway it could be modified to generate the desired results? Unfortunately I'm not that good with VBA. :(

Sub test()
Dim datacount As Long
Dim termcount As Long

datacount = Sheets("Sheet1").Cells(Rows.Count, "A").End(xlUp).Row
termcount = Sheets("Sheet2").Cells(Rows.Count, "A").End(xlUp).Row

For i = 1 To datacount

    dataa = Sheets("Sheet1").Cells(i, "A").Text
    
    result = ""
            
    For j = 1 To termcount
    
        terma = Sheets("Sheet2").Cells(j, "A").Text
        termb = Sheets("Sheet2").Cells(j, "B").Text
        
        If InStr(dataa, terma) > 0 Then
        
             
        If result = "" Then
           
           result = result & termb
           
        Else
        
            result = result & ", " & termb
            
        End If
            
        End If
        
    Next j
    
       Sheets("Sheet1").Cells(i, "B").Value = result
    
    Next i
End Sub
Adrian
  • 3
  • 2
  • As far as I know, there is no way of doing this with normal Excel Formulas. You'll need to code something in VBA to study each single word in each phrase (probably spliting), and then match them in same order – Foxfire And Burns And Burns Mar 16 '21 at 07:44

2 Answers2

0

I would suggest you use Power Query which is a built-in function since Excel 2013.

Suppose the text strings of colours on your Sheet1 is in a Table named Tbl_LookUp Tbl_LookUp

Suppose the phrases on your Sheet2 is in another Table named Tbl_Phrases Tbl_Phrases

Go to the Data tab of your Excel and load both tables to the Power Query Editor (you can google how to load data from a table to the PQ Editor in Excel 2016). Please note the screenshot is from Excel 365. Data Load Table

Once loaded, go to the Tbl_Phrases query, and action the following steps:

  1. Add an indexed column starting from 1 Add Index Column
  2. Split the Phrases column by delimiter, use space as the delimiter and choose to put the outcome into rows Split Column
  3. Merge the current query with the Tbl_LookUp query, use the Phrase column to match the Old Text column Merge with Look Up Table
  4. Expand the new column to show contents from New Text column Expand the table Show New Text
  5. Group the New Text column by the Index column, you can choose to sum the values in the New Text column, and it will come up as an error after the grouping. Go to the formula field and replace this part of the formula List.Sum([New Text]) with Text.Combine([New Text],", "). Hit enter and the error will be corrected to the desired text string. Group Texts

The following is the full M Code for the above query. You can copy and paste it in the Advanced Editor without manually going through each step:

let
    Source = Excel.CurrentWorkbook(){[Name="Tbl_Phrases"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Phrases", type text}}),
    #"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 1, 1, Int64.Type),
    #"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(#"Added Index", {{"Phrases", Splitter.SplitTextByDelimiter(" ", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Phrases"),
    #"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Phrases", type text}}),
    #"Merged Queries" = Table.NestedJoin(#"Changed Type1", {"Phrases"}, Tbl_LookUp, {"Old Text"}, "Tbl_Replace", JoinKind.LeftOuter),
    #"Expanded Tbl_Replace" = Table.ExpandTableColumn(#"Merged Queries", "Tbl_Replace", {"New Text"}, {"New Text"}),
    #"Grouped Rows" = Table.Group(#"Expanded Tbl_Replace", {"Index"}, {{"Look up color", each Text.Combine([New Text],", "), type nullable text}})
in
    #"Grouped Rows"

When you finish adding an index column in the Tbl_Phrases query, which is Step 1 from the above, you can make a copy of the query (simply right click the original query and select "duplicate"), then you will have a second query called Tbl_Phrases (2). No need to work on this query until you finish editing the original query ended up with desired text strings.

Then you can merge the Tbl_Phrases (2) query with the Tbl_Phrases query using the index column. Expand the new column to show the content from the look up colour column. Lastly, merge the Phrases column with the look up color column with delimiter (space)-(space), and remove the index column, then you should have the desired text string.

Merge Columns

Here is the M Code for the Tbl_Phrases (2) query. Just a reminder, you must finish with the Tbl_Phrases query first otherwise the merging query step will lead to an error:

let
    Source = Excel.CurrentWorkbook(){[Name="Tbl_Phrases"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Phrases", type text}}),
    #"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 1, 1, Int64.Type),
    #"Merged Queries" = Table.NestedJoin(#"Added Index", {"Index"}, Tbl_Phrases, {"Index"}, "Tbl_Phrases", JoinKind.LeftOuter),
    #"Expanded Tbl_Phrases" = Table.ExpandTableColumn(#"Merged Queries", "Tbl_Phrases", {"Look up color"}, {"Look up color"}),
    #"Merged Columns" = Table.CombineColumns(#"Expanded Tbl_Phrases",{"Phrases", "Look up color"},Combiner.CombineTextByDelimiter(" - ", QuoteStyle.None),"Merged"),
    #"Removed Columns" = Table.RemoveColumns(#"Merged Columns",{"Index"})
in
    #"Removed Columns"

You can then load the Tbl_Phrase (2) query to the desired worksheet within the same workbook (or to somewhere on Sheet2). Outcome

Let me know if you have any questions.

Terry W
  • 3,199
  • 2
  • 8
  • 24
  • Thank you for the detailed tutorial! However i never used power query before so I kinda got stuck at step 5. Can you post some screenshots of the grouping process? Also how do I get to the formula field to replace this part of the formula List.Sum([New Text]) with Text.Combine([New Text],", "). ? At the same time, sorry for the churn but i don't want a merged final result. I need the keywords to be extracted and replaced in column B. I edited the original post in order to better visualize this. Could you maybe help me out with this as well? – Adrian Mar 16 '21 at 13:01
  • @Adrian at step 5, you select the [New Text] column, click the `Group By` function, you will see a window asking to set up the parameters for setting up the grouping. You should group the [New Text] column by the [Index] column, and choose the grouping method to be "Sum" in the first instance. Confirm the grouping set up and the query will end up with errors. Then you can go to the formula bar to replace the original `List.Sum` function with the `Text.Combine` function. – Terry W Mar 16 '21 at 22:16
  • @Adrian Regarding your amended outcome, you can still follow up the second part of the answer but simply skip the merging column part, just remove the index column after the query merge and expand. – Terry W Mar 16 '21 at 22:16
0

You can do this with a User Defined Function making use of Regular Expressions.

The worksheet formula:

=matchWords(A2,$K$2:$L$5)

where A2 contains the sentence, and the second argument points to the translation table (which could be on another worksheet).

The code

Option Explicit
Function matchWords(ByVal s As String, translTbl As Range) As String
    Dim RE As Object, MC As Object, M As Object
    Dim AL As Object 'collect the replaced words
    Dim TT As Variant
    Dim I As Long
    Dim vS As Variant
'create array
TT = translTbl

'initiate array for output
Set AL = CreateObject("system.collections.arraylist")

'initiate regular expression engine
Set RE = CreateObject("vbscript.regexp")
With RE
    .Global = True
    .ignorecase = True 'could change this if you want
    .Pattern = "\w+" 'can change this if need to include some non letter/digit items
    
    'split the sentence, excluding punctuation
    If .test(s) Then
        Set MC = .Execute(s)
            For Each M In MC
                For I = 1 To UBound(TT, 1)
                    If M = TT(I, 1) Then AL.Add TT(I, 2)
                Next I
            Next M
    End If
End With
    
matchWords = Join(AL.toarray, ", ")
    
End Function

enter image description here

Ron Rosenfeld
  • 53,870
  • 7
  • 28
  • 60
  • Hi Ron! Thank you for the solution but from what i can see from your screenshot, your output is different from the one I want. Your function changed the words in the initial phrase and moved the original keywords as the output. I don't want to replace the words inside the original phrases. I want to look for the keywords, if they exist then i want to extract them in a separate column and replace them there. – Adrian Mar 16 '21 at 13:41
  • @Adrian I misunderstood. What you want is even simpler. I have modified my answer. – Ron Rosenfeld Mar 16 '21 at 14:09
  • Hello again Ron! This looks exactly like what I need! I see here this line of code: .Pattern = "\w+" 'can change this if need to include some non letter/digit items To what should I change it? Because I definitely need to include numbers and special characters as well. – Adrian Mar 17 '21 at 05:18
  • Also how can I have the matched words auto populate at the push of a button? Let's say I have in Sheet 1 a table with the Phrases in column A and the matched words in column B and the translation table in a separate table in sheet 2 and a button on sheet 1 that when pressed will auto fill column B of matched words? – Adrian Mar 17 '21 at 05:27
  • I have just tried the function on my end and it gives me a name error. #NAME? Any idea on why this would happen? – Adrian Mar 17 '21 at 05:55
  • @Adrian `#NAME?` Typo; naming the module the same as the UDF; not having the UDF in the proper module in the workbook from where you are working; etc – Ron Rosenfeld Mar 17 '21 at 10:18
  • @Adrian `populate with button push`. If you don't want the results to update automatically when you change things, you can rewrite the UDF as a Macro (Sub) and program a button to trigger it at will. Of course, you might want to also do something so you can tell if the results are in sync with the data. – Ron Rosenfeld Mar 17 '21 at 10:20
  • @Adrian You would change `\w+` to whatever regex matches what you need to match and does not produce false matches. A tutorial on constructing valid regexes is not something that can be covered in a comment -- there are books about it. I'd suggest starting with [How to use Regular Expressions in Microsoft Excel](https://stackoverflow.com/questions/22542834/how-to-use-regular-expressions-regex-in-microsoft-excel-both-in-cell-and-loops) and also do an internet search for tutorials in regular expressions. – Ron Rosenfeld Mar 17 '21 at 10:27
  • @Adrian *By the way, contrary to what you wrote, I wrote that `\w+ does *not* include non digit/letters`. To me that implies that it **does** include letters and digits. It also includes the underscore.* More succinctly represented as being equivalent to `[A-Za-z0-9_]` – Ron Rosenfeld Mar 17 '21 at 10:28
  • Hi Ron! You are awesome! Got it to work and it works flawlessly! However, it does not extract numbers. If i have the phrase "This color is 10% yellow" and I add the following rules for translation Yellow - Brown and 10% - 15%, then the numbers are completely ignored. It recognizes the Yellow - Brown rule and it returns Brown but that's it. I guess i better get started on those regexes :D. Thank you again! – Adrian Mar 17 '21 at 12:05
  • @Adrian For those examples, you might try changing the regex to `\w+%?` in order to accept, but not require, a trailing `%` in a word. – Ron Rosenfeld Mar 17 '21 at 21:52