2

I have two columns and I am trying to union the text in these two columns and store the result in the third. In the example data below, Union is the output I am trying to achieve. When I concatenate col1 and col2 data, there are duplicates in the result. Is there a way to get rid of the duplicates in Excel using power query? Or should I write a C# program and use union operator to get the result set I am looking for. Thanks

Col1   Col2       Union
 ab     a          ab
 a+     ab         ab+
 a      a          a  
 a      ab         ab
Community
  • 1
  • 1
Aqua267
  • 873
  • 3
  • 12
  • 35
  • If col1 had john and col2 had joan, what would be your output? Would it be johan (in that order of characters) or would it be johna? – zedfoxus Aug 14 '14 at 15:22
  • @zfus My columns have only a limited set of characters like the sample data I gave. So I am not looking for a formula that covers an extensive dataset. To your question, johna is the output I am looking for. Put all characters in col1 and append only characters from col2 that are unique. Thanks! – Aqua267 Aug 14 '14 at 15:32
  • So, what's your question? How to get rid of duplicates or how to create the union formula? Basically, are you saying you want to get rid of, in your example, the 4th row? – John Bustos Aug 14 '14 at 15:36
  • @Aqua267 Understood. Let me see if I can whip up a VBA function quick – zedfoxus Aug 14 '14 at 15:37
  • Also, given your response to @zfus, why would your second row not be `a+b`? - That would be taking the first column and simply adding in the character missing from the second... – John Bustos Aug 14 '14 at 15:40
  • @JohnBustos I am looking to union Col1 and Col2 to produce a new column Union. I am not looking to get rid of any rows, but to create the Union column with the merged text. Hope I am making it clear. – Aqua267 Aug 14 '14 at 15:40
  • @JohnBustos the + is a special case which should always come in the end. For other characters what I said holds true. Thanks! – Aqua267 Aug 14 '14 at 15:41
  • @Aqua267 my answer lacks this special case, but can be worked-around by putting a+ in col2 instead of col1 or doing `=RemoveDupes(B1 & A1)` for the given example above. Just FYI. – zedfoxus Aug 14 '14 at 15:58

2 Answers2

2

Go to VBA [1], right click your VBAProject > Insert > Module. Type this function [2]. Then in column3 of your sheet type the formula =RemoveDupes(A1 & B1)

Public Function RemoveDupes(str As String) As String
Dim i As Long
Dim objDict As Object

    Set objDict = CreateObject("Scripting.Dictionary")

    For i = 1 To Len(str)
        If objDict.exists(Mid(str, i, 1)) Then
            'do nothing
        Else
            objDict.Add Mid(str, i, 1), i
            RemoveDupes = RemoveDupes & Mid(str, i, 1)
        End If
    Next i

    objDict.RemoveAll
    Set objDict = Nothing
End Function

If you want to roll your own long-winded one, you could do this:

  • create an array and split your string into characters for both your columns (essentially, you will have two arrays filled with characters)
  • create a new array
  • read each character of first array and insert in the new array only if it is not already there in the new array
  • read each character from second array and insert in the new array only if it is not already there in the new array
  • Join each character into a string and output that

[1] How do I open Excel VBA editor in Excel 2010

[2] Answered on Experts Exchange instead of rolling my own for this answer. Credits to dlmille

Community
  • 1
  • 1
zedfoxus
  • 35,121
  • 5
  • 64
  • 63
1

I'm just giving my solution to this too since I wanted to see how I would do it.

In VBA, use the following function:

Function Union(cl1 As String, cl2 As String) As String

Dim i As Integer
Dim chr As String
Dim PlusIncluded As Boolean

    PlusIncluded = False

    For i = 1 To Len(cl1)
        chr = Mid(cl1, i, 1)
        If chr = "+" Then
            PlusIncluded = True
            cl1 = Replace(cl1, chr, "")
        End If

        cl2 = Replace(cl2, chr, "")
    Next i

    Union = cl1 & cl2

    If PlusIncluded Then Union = Union & "+"

End Function

Then you can access it in your excel sheet - something along the lines of =Union(A1,B1)

Basically, to explain the function - It takes the 2 cells data and then:

  • Loops through all the characters in the first cell
  • If it is a plus sign, it removes it from the first cell and sets a flag to know we encountered one
  • It removes that character from the second cell's data
  • It then simply adds what's left from the second cell to the first and a plus if the flag is set to true

Hope this helps!

John Bustos
  • 19,036
  • 17
  • 89
  • 151