-2

I have a worksheet which I use to import XML data. I'd like to use VBA to find the duplicates within the data range (only in one column) and extract duplicates along with the number of times it appears as shown in the picture below. The original data should be retained as I will continuously add on to the column with new XML data. As for now I only found ways to highlight or remove duplicates but I want to visualize the data better by having a separate list to see data that is repeated the most. I'm only interested in the duplicates so data that only appears once can be ignored.

EDIT: I have thousands or rows of data to work with and I won't be sure which ones will have duplicates so I think using countif with every row would be pretty inefficient.

Thanks!

This is how it should look

Benjamin Sow
  • 3
  • 1
  • 6
  • u can use excel functions like count if and remove duplicates instead of VBA. – Techie Jun 27 '18 at 06:57
  • @techie I'm using VBA to import the XML so it'd be nice if I could just macro everything. Count if is interesting but I should add that the data in each cell is pretty long and I wouldn't know what they are exactly (not as simple as AAA or BBB) but I do know that there will be duplicates. – Benjamin Sow Jun 27 '18 at 07:08
  • Have a quick look around for examples like here: https://stackoverflow.com/questions/36044556/quicker-way-to-get-all-unique-values-of-a-column-in-vba or here https://analysistabs.com/vba/find-duplicate-values-column/. Try something with these pieces of code and come back to tell us where in your code you have issues. – JvdV Jun 27 '18 at 07:19
  • Possible duplicate of [How can I remove duplicate rows?](https://stackoverflow.com/questions/18932/how-can-i-remove-duplicate-rows) Questions like these get posted here every day. Do a bit of research please – Samuel Hulla Jun 27 '18 at 07:58
  • @Rawrplus As mentioned in the question I have done my research and found methods for removing duplicate rows but that's not what I'm looking for. – Benjamin Sow Jun 27 '18 at 08:44
  • @BenjaminSow With all due to respect, I don't think replacing two lines of code, where instead of deleting the row you `PasteSpecial` to your set `Range` is all that difficult that would warrant a new question. – Samuel Hulla Jun 27 '18 at 08:49
  • @Rawrplus I understand where you are coming from, but I'm looking to not only copy the rows, but also obtain the number of occurances and paste that value. I started vba yesterday without any prior knowledge and did pretty much a whole day of trial and error and Googling before asking this. As seen by Sean's answer it is more than two lines two get both of the items working. Sorry if this is unwarranted. – Benjamin Sow Jun 27 '18 at 08:57
  • @BenjaminSow Alright. Thanks for clearing it up. Maybe to simplify the question for you, **it would be better to look at your question as two seperate issues - tasks**, rather than a one big issue: **1) Implement an algorithm that detects and prints duplicities** and **2) Create an algorithm that counts the amount of occurances.** When you dissect your problems like so, the question gets divided into two very trivial issues that you can find easily on the internet - which was my point to begin with. Though given you're new to VBA I understand this may not occur to you naturally, so fair game. – Samuel Hulla Jun 27 '18 at 09:02
  • @Rawrplus Yeah after I posted this question I realized the word instances and occurrence would fit better and I did find some sites that helped with counting them. Thanks for the suggestion. – Benjamin Sow Jun 27 '18 at 09:05
  • Glad it helped. By the way, what I described here is usually a very good (if not the best) approach to think of any coding tasks you're about to do. If you try to think of something as one giant issue, it will almost always end up overwhelming you. However, if you divide your issue into multiple ones, not only this helps you simplify the entire matter at hand, but actually helps you create a bettter code structure (eg. you think of using functions rather than one giant block) and suddenly, the insurmountable slope becomes few easy steps towards achievement - definitely a practice to live by :) – Samuel Hulla Jun 27 '18 at 09:13

2 Answers2

0

Tested and working:

Option Explicit

Sub find_dups()

    ' Create and set variable for referencing workbook
    Dim wb As Workbook
    Set wb = ThisWorkbook

    ' Create and set variable for referencing worksheet
    Dim ws As Worksheet
    Set ws = wb.Worksheets("Data")

    ' Find current last rows
    ' For this example, the data is in column A and the duplicates are in column C
    Dim lngLastRowData As Long
    lngLastRowData = ws.Range("a1048576").End(xlUp).Row
    Dim lngLastRowDups As Long
    lngLastRowDups = ws.Range("c1048576").End(xlUp).Row

    ' Create and set a variable for referencing data range
    Dim rngData As Range
    Set rngData = ws.Range("a2:a" & lngLastRowData)

    Dim lngRowCount As Long
    lngRowCount = 0

    Dim clData As Variant
    Dim lngCount As Long

    Dim lngRowIndexData As Long
    Dim lngRowIndexDups As Long
    lngRowIndexDups = lngLastRowDups + 1

    ' Variable to store those values we've already checked
    Dim strAlreadySearched As String


    For Each clData In rngData.Cells

        ' Reset variables
        lngCount = 0


        ' See if we've already searched this value
        If InStr(1, strAlreadySearched, "|" & clData.Value & "|") = 0 Then

            ' We haven't, so proceed to compare to each row
            For lngRowIndexData = 1 To lngLastRowData

                ' If we have a match, count it
                If rngData.Cells(lngRowIndexData, 1).Value = clData.Value Then
                    lngCount = lngCount + 1
                End If

            Next lngRowIndexData

            ' If more than 1 instance
            If lngCount > 1 Then
                ' Dup's were found, fill in values under duplicates
                ws.Cells(lngRowIndexDups, 3).Value = clData.Value
                ws.Cells(lngRowIndexDups, 4).Value = lngCount

                ' Drop down a row
                lngRowIndexDups = lngRowIndexDups + 1

                ' Capture this value so we don't search it again
                strAlreadySearched = strAlreadySearched & "|" & clData.Value & "|"


            End If
        End If

    Next clData



End Sub
SeanW333
  • 479
  • 4
  • 9
  • Thanks for the comprehensive answer @SeanW333. It worked perfectly although it took me sometime to go through the code, didn't think it would be a lengthy one. However, I'm curious as to how would one go about checking duplicates over two columns? As in only count as a duplicate if there are more than one set of two cells adjacent to each other. As a beginner the easiest way for me is to to concatenate each row in the two columns into one column and then use the above code but I'm guessing there should be a better way? – Benjamin Sow Jul 01 '18 at 13:59
  • Actually, the concatenation method is perfectly valid. However, if that feels hacky to you, you could also replace this: `If rngData.Cells(lngRowIndexData, 1).Value = clData.Value Then` with this: `If rngData.Cells(lngRowIndexData, 1).Value = clData.Value AND rngData.Cells(lngRowIndexData, 2).Value = clData.Offset(0,1).Value Then` (assuming the additional column is the one to the right of the main column), which achieves the same thing. – SeanW333 Jul 01 '18 at 17:23
-1

You can use range("a:a").RemoveDuplicates in vba code. This will remove all the duplicates. Or you can use conditional formatting to color duplicates.

enter image description here