2

I have some tabular data as follows.

|   | A        | B            | C                | D                                                 |
|---|----------|--------------|------------------|---------------------------------------------------|
|   |          | p1           | p2               | pn                                                |
| 1 | Lanterns | Bruce Wayne  | Jean-Paul Valley | Dick Grayson; Terry McGinnis; Jean-Paul Valley    |
| 2 | Bats     | Alan Scott   | Hal Jordan       | Guy Gardner; John Stewart; Kyle Rayner; Simon Baz |
| 3 | Fates    | Kent Nelson  | Khalid Nassour   | Hector Hall; Khalid Nassour; Khalid Ben-Hassin    |
| 4 | Supes    | Clark Kent   | John Henry Irons | Conner Kent; Hank Henshaw; Kong Kenan             |
| 5 | Spideys  | Peter Parker | Peter Parker     | Ben Reilly; Miles Morales                         |
| 6 | Irons    | Tony Stark   | Happy Hogan      | James Rhodes; Eddie March; James Rhodes           |

For each row, I want to find if duplication exists between columns B, C as well as semi colon separated values of column D.

How to do this in excel?

The desired output will be as follows.

| X | A        | B            | C                | D                                                 | E     |
|---|----------|--------------|------------------|---------------------------------------------------|-------|
|   |          | p1           | p2               | pn                                                |       |
| 1 | Lanterns | Bruce Wayne  | Jean-Paul Valley | Dick Grayson; Terry McGinnis; Jean-Paul Valley    | TRUE  |
| 2 | Bats     | Alan Scott   | Hal Jordan       | Guy Gardner; John Stewart; Kyle Rayner; Simon Baz | FALSE |
| 3 | Fates    | Kent Nelson  | Khalid Nassour   | Hector Hall; Khalid Nassour; Khalid Ben-Hassin    | TRUE  |
| 4 | Supes    | Clark Kent   | John Henry Irons | Conner Kent; Hank Henshaw; Kong Kenan             | FALSE |
| 5 | Spideys  | Peter Parker | Peter Parker     | Ben Reilly; Miles Morales                         | TRUE  |
| 6 | Irons    | Tony Stark   | Happy Hogan      | James Rhodes; Eddie March; James Rhodes           | TRUE  |

Edit There was an error in the column names in the question causing lack in clarity. Fixed it now.

Update

Here is my attempt with VBA as suggested by @Foxfire And Burns And Burns. It is adapted from https://superuser.com/a/1005497/460054

Public Function HasDuplicates(list As String, delimiter As String) As String
Dim arrSplit As Variant, i As Long, tmpDict As Object, tmpOutput As Boolean
Set tmpDict = CreateObject("Scripting.Dictionary")
arrSplit = Split(list, delimiter)
tmpOutput = False
For i = LBound(arrSplit) To UBound(arrSplit)
    If tmpDict.Exists(Trim(arrSplit(i))) Then
        tmpOutput = True
        Exit For
    Else
    tmpDict.Add Trim(arrSplit(i)), Trim(arrSplit(i))
    End If
Next i
HasDuplicates = tmpOutput
'housekeeping
Set tmpDict = Nothing
End Function

Here is all possibel use cases again as suggested by @Foxfire And Burns And Burns.

+---+-----+----+-----------+--------------------+-------+
|   |  A  | B  |     C     |         D          |   E   |
+---+-----+----+-----------+--------------------+-------+
| 1 | A   | B  |           | A; B;              | False |
| 2 | A   |    |           | A; ;               | True  |
| 3 |     |    |           | ; ;                | True  |
| 4 | G   | K  | G         | G; K; G            | True  |
| 5 | N   | M  | O         | N; M; O            | False |
| 6 | N   | N  | O         | N; N; O            | True  |
| 7 | V   | U  | X; Y; X   | V; U; X; Y; X      | True  |
| 8 | P J | VK | P; J; V K | P J; VK; P; J; V K | False |
| 9 | VK  | O  | R; VK     | VK; O; R; VK       | True  |
+---+-----+----+-----------+--------------------+-------+

The formula for columnD is =CONCATENATE(B2,"; ",C2, "; ",D2) and for E is =HasDuplicates(E2, ";").

But here it is not handling empty cells. Rows 2 and 3 should also be False.

Crops
  • 5,024
  • 5
  • 38
  • 65
  • So you need to check if values of column B are duplicated in column C and also check if the values separated by semi colon in column D are duplicated in same cell? No relation between B or C with D? – Foxfire And Burns And Burns Sep 24 '19 at 11:00
  • Yes and also if values in B and C are present in D too. – Crops Sep 24 '19 at 11:07
  • To check if the values separated by semi colon in column D are duplicated in same cell, I'm afraid you need VBA for that. Also, please, edit your question with that new info to make it clear, and you should post all the posibble cases and what results you expect in each case. – Foxfire And Burns And Burns Sep 24 '19 at 11:10
  • @FoxfireAndBurnsAndBurns I have updated the query with my attempt with vba. – Crops Sep 24 '19 at 12:07
  • @Crops My final formula, which adjusts for the separator not always being `; ` (as it is in some of your instances with the terminal `;`, will handle your last set of data. – Ron Rosenfeld Sep 24 '19 at 12:23

5 Answers5

6

If you have O365 or Excel 2016 with the TEXTJOIN function:

=NOT(ISERROR(FILTERXML("<t><s>" &TEXTJOIN("</s><s>",TRUE,TRIM(B2),TRIM(C2),SUBSTITUTE(TRIM(D2),"; ","</s><s>"))& "</s></t>","//s[.=./following-sibling::*]")))

If you don't have TEXTJOIN, but do have FILTERXML, you can use:

=NOT(ISERROR(FILTERXML("<t><s>"&TRIM(B2)&"</s><s>"&TRIM(C2)&"</s><s>"&SUBSTITUTE(TRIM(D2),"; ","</s><s>")&"</s></t>","//s[.=./following-sibling::*]")))

enter image description here

We construct an XML of all the names in separate nodes, and then look for duplicates.

Without the NOT(ISERROR(… part, the formula will return the name of the duplicate (or an array of the names if there are more than one set of duplicates).

NOTE: The formula depends on the separator in Column D being ; (semicolon-space). If the space will not always be there, the formula will need modification to remove it if present (nested substitutes, or TRIM's would do that).

eg

=NOT(ISERROR(FILTERXML("<t><s>"&TRIM(B11)&"</s><s>"&TRIM(C11)&"</s><s>"&SUBSTITUTE(SUBSTITUTE(TRIM(D11),"; ",";"),";","</s><s>")&"</s></t>","//s[.=./following-sibling::*]")))

Results on 2nd Test

enter image description here

If you have an earlier version of Excel, and can use a VBA solution, try:

Option Explicit
Function hasDups(rg As Range, Optional sDelim As String = ";") As Boolean
    Dim myDict As Object
    Dim x, y, s As String, i As Long, c As Range

Set myDict = CreateObject("scripting.dictionary")

For Each c In rg
    x = Split(c.Value2, sDelim)
    For Each y In x
      If Len(Trim(y)) > 0 Then
        If Not myDict.exists(Trim(y)) Then
            myDict.Add Trim(y), y
        Else
            hasDups = True
            Exit Function
        End If
      End If
    Next y
Next c

End Function
Ron Rosenfeld
  • 53,870
  • 7
  • 28
  • 60
  • Ron, would you believe it, I just used [this](https://stackoverflow.com/questions/54220420/how-to-use-a-string-from-a-single-cell-as-an-array-in-a-function-without-vba) post of yours as a reference point to utilize the `XML` function. ++ – JvdV Sep 24 '19 at 11:46
  • @JvdV :-) Note that the xPath language can include a test for duplicates, which simplifies things for this situation. – Ron Rosenfeld Sep 24 '19 at 11:52
  • I would plus you 10 times if possible. Very neat! Small suggestion if `TRIM` won't be needed: `=NOT(ISERROR(FILTERXML(""&SUBSTITUTE(SUBSTITUTE(B2&"; "&C2&"; "&D2,"",""),"; ","")&"","//s[.=./following-sibling::*]")))`. Either way, brilliant stuff. – JvdV Sep 24 '19 at 12:00
  • @JvdV In the copy/paste I did from the question, `TRIM` was needed. With real data, who knows? But it doesn't hurt. – Ron Rosenfeld Sep 24 '19 at 12:16
4

It's possible through formula but there are some considerations:

  • Because of the names can be either Tony Stark and Anthony Stark or Paul Valley and Jean-Paul Valley you can't use SEARCH
  • You'll have to concatenate the three columns and turn that into an array
  • Check the array for duplicates (unfortunately without COUNTIF), with
  • Check the result against the total of items in the array.

    =SUMPRODUCT(--(FILTERXML("<t><s>"&SUBSTITUTE(SUBSTITUTE(B2&"; "&C2&"; "&D2,"",""),"; ","</s><s>")&"</s></t>","//s")=TRANSPOSE(FILTERXML("<t><s>"&SUBSTITUTE(SUBSTITUTE(B2&"; "&C2&"; "&D2,"",""),"; ","</s><s>")&"</s></t>","//s"))))<>LEN(B2&"; "&C2&"; "&D2)-LEN(SUBSTITUTE(B2&"; "&C2&"; "&D2,";",""))+1
    

It's a lenghty one, but FILTERXML puts our values in an array rather nicely, which we then can use.

Note: It's an array formula and needs to be confirmed through CtrlShiftEnter

enter image description here

JvdV
  • 70,606
  • 8
  • 39
  • 70
  • Epic one!. Never heard of FILTERXML, because I do not have that function. My Excel is too old! – Foxfire And Burns And Burns Sep 24 '19 at 13:01
  • @FoxfireAndBurnsAndBurns, check [this](https://stackoverflow.com/questions/25316094/split-a-string-cell-in-excel-without-vba-e-g-for-array-formula) post by RonRosenfeld. It's an non-FILTERXML alternative to delimiter a string into an array of items. – JvdV Sep 24 '19 at 13:09
2

Try this formula in cell E1 and copy it down:

=IF(ISNUMBER(SEARCH(C1;D1));TRUE;FALSE)

You might need to change the semicolons ; to regular commas , depending on your version of Excel.

EDIT: A simpler and more elegant version would be

=ISNUMBER(SEARCH(C1;D1))
riskypenguin
  • 2,139
  • 1
  • 10
  • 22
  • This is not working. It is showing FALSE for row 5. – Crops Sep 24 '19 at 09:56
  • 2
    Could you provide a screenshot of what you're seeing, including the data and the formula you put in? I tried the formula on your sample data and it worked just fine. – riskypenguin Sep 24 '19 at 10:00
  • @Crops It's showing FALSE in row 5 because, according to the input you've posted, value `Peter Parker` is not present in column D. Same for row 6. `Happy Hogan` is not in Column D. I think this answer is correct – Foxfire And Burns And Burns Sep 24 '19 at 10:22
2

Could you please try this formula

=IFERROR(IF(SEARCH(C1,D1,1)>0,"TRUE"), "FALSE")
PASUMPON V N
  • 1,186
  • 2
  • 10
  • 17
1

If your values got extra spaces, maybe it can help to add a TRIM function to clean it.

I got this:

enter image description here

Column E got this formula:

=IFERROR(SEARCH(TRIM(C1);D1);0)>0

The output will be a boolean value TRUE/FALSE In screenshot it's in spanish but VERDADERO means TRUE and FALSO means FALSE.

In row 5 and 6 the output is FALSE becaue values Peter Parker and Happy Hogan are not found in column D. You posted in your wished output that these 2 rows should return TRUE, but you have not explained why.

  • In row 5, both `Peter Parker` is present in both columns B and C. For row 5, `James Rhodes` is duplicated within column D. – Crops Sep 24 '19 at 10:37