1

I'm collecting data from a spreadsheet and storing it in a 2-D Array, the idea is that once the script detects it's reading from a specific column, it would not read an entire row of the data (as this would be considered a duplicate).

CODE:

Private Sub LoadData()

   cDOC_DEBUG "Loading document data..."
   Dim x As Long  'Column Data - there is another function that reads when x = 0 = header; else every other value is considered "data"
   Dim y As Long 

   With dataWS
      For x = 1 To LR - 1
         For y = 1 To LC - 1
            If (IsInArray(.Cells(x + 1, y + 1).value, pData())) Then
               cDOC_DEBUG "Added: " & .Cells(x + 1, y + 1).value
               pData(x, y) = Trim(.Cells(x + 1, y + 1).value)
            End If
         Next y
      Next x
   End With

End Sub

Private Function IsInArray(stringToBeFound As String, arrString As Variant) As Boolean
   IsInArray = (UBound(Filter(arrString, stringToBeFound)) > -1)
End Function

Private Sub cDOC_DEBUG(debugText As String)
   If (ThisWorkbook.Worksheets("Settings").Cells(3, 2)) Then
      Debug.Print debugText
   End If
End Sub

Everything is loading into the array fine, until I start implementing my IsInArray function. I can see it has to do with the fact that it's searching through a single dimensional array, and my array is two dimensional; so it makes sense that it's getting a type mismatch error.

Each row within the spreadsheet is a segment of information that correlates to it's self.

Initial Data From Spreadsheet:

        A           B           C           D
1    header1     header2     header3     header4
2       a           b           c           d
3       w           x           y           z
4       a           h           j           j
5       a           b           j           d
6       w           x           u           z

2x2 Final Array:

        0           1           2           3
0    header1     header2     header3     header4
1       a           b           c           d
2       w           x           y           z
3       a           h           j           j

Because Header1 & Header2 & Header4 from Excel rows 5 & 6 have the same values as Excel rows 2 and 3, this will not be read into the array.

Question:

How would I match the criteria above to not include the duplicates from a row.

Example Sudo Code:

If (Value being added matches all values from column Header1 & Header2 & Header3_ Then

Don't add to array

Another issue that I am aware of, is that there will be blank data within this array; is there something I can do to either 1 remove these or will I have to have another index for the array slots to keep track of?

Maldred
  • 1,074
  • 4
  • 11
  • 33
  • To clarify, that's called a 2-D (or 2 dimensional) array. (I think 2x2 would imply there are a total of 4 elements ie., `A1:B2`). Beyond that - maybe I don't fully understand your definition of "duplicate" here -- you want to remove rows where "header 3" is duplicated? – ashleedawg Sep 13 '18 at 15:57
  • @ashleedawg Sorry, yes you are correct it is a 2-D array. To clarify, if there is a duplicate value found in Header1 and Header2 and Header3 this item would not be added to the array. In the example you can see that rows 5&6 were classified as duplicates and thus not included in the array. Array Row 5 was equal to Excel Row 2 and Array Row 6 was equal to Excel Row 3. – Maldred Sep 13 '18 at 16:06
  • I think there's a much easier way than using VBA but I'm still not clear. If Header 1 **or** Header 2 **or** Header 3 have duplicates? – ashleedawg Sep 13 '18 at 16:10
  • ...I don't understand because there's still a duplicate in the first column. – ashleedawg Sep 13 '18 at 16:12
  • @ashleedawg Right, that is fine, so the criteria needs to be (Header1 & Header2 & Header3), you're thinking it in terms (Header1 Or Header2 Or Header3). I apologize about my explanation; I am having a hard time wording that lol – Maldred Sep 13 '18 at 16:16
  • ...Do you mean "matches **any**" instead of "all". It can't match "all" unless all 4 columns are the same – ashleedawg Sep 13 '18 at 16:16
  • In your example "final array", the first column still has duplicates (`a`'s) and the last row still has duplicates (`j`'s)... I don't understand – ashleedawg Sep 13 '18 at 16:21
  • 1
    @ashleedawg I am not comparing individual values from a single row, I'm comparing rows against each other. Row 5 was equal to Row 2, so it was not included in the final array. Row 6 was equal to Row 3, so it was not included in the final array. ONLY using columns A, B, and D; Column C was not including in the criteria – Maldred Sep 13 '18 at 16:24
  • @Maldred, added a solution using a dictionary check and `Application.Index` function. would appreciate your feed back :-) – T.M. Sep 13 '18 at 18:53

2 Answers2

1

You can loop rows/columns and use Index to slice a row/column out of the array and use Match to test if search value is in that column. Combine with Count to test for duplicates. If the count equals the number of columns ignore value (or column count -1... see next comment ==>). Not entirely sure about this imaginary column. Do you intend to dimension at start with an additional empty column?

Row Versions:

Exists:

Option Explicit
Public Sub CheckRow()
    Dim arr(), i As Long
    arr = [A1:D6].Value                          '<==2D array created

    For i = LBound(arr, 1) To UBound(arr, 1)     '<== loop rows
        'look in each row for x and if found exit loop and indicate row where found
        If Not IsError(Application.Match("x", Application.WorksheetFunction.Index(arr, i, 0), 0)) Then
            Debug.Print "value found in column " & i
            Exit For
        End If
    Next
End Sub

Duplicates:

Option Explicit
Public Sub CheckRow()
    Dim arr(), i As Long
    arr = [A1:D6].Value                          '<==2D array created

    For i = LBound(arr, 1) To UBound(arr, 1)     '<== loop rows
        'look in each row for more than one "B" and if found exit loop and indicate row where found
         If Application.Count(Application.Match(Application.WorksheetFunction.Index(arr, i, 0), "B", 0)) > 1 Then
            Debug.Print i
            Exit For
        End If
    Next
End Sub

exists:


Columns versions:

Exists:

Option Explicit
Public Sub CheckColumn()
    Dim arr(), i As Long
    arr = [A1:D6].Value                          '<==2D array created

    For i = LBound(arr, 2) To UBound(arr, 2)     '<== loop columns
        'look in each column for x and if found exit loop and indicate column where found
        If Not IsError(Application.Match("x", Application.WorksheetFunction.Transpose(Application.WorksheetFunction.Index(arr, 0, i)), 0)) Then
            Debug.Print "value found in column " & i
            Exit For
        End If
    Next
End Sub

Duplicates:

You can use Count to check for duplicates within an entire column, again sliced with Index:

Option Explicit
Public Sub CheckColumn()
    Dim arr(), i As Long
    arr = [A1:D6].Value                          '<==2D array created

    For i = LBound(arr, 2) To UBound(arr, 2)     '<== loop columns
        'look in each column for more than one "B" and if found exit loop and indicate column where found
         If Application.Count(Application.Match(Application.WorksheetFunction.Transpose(Application.WorksheetFunction.Index(arr, 0, i)), "B", 0)) > 1 Then
            Debug.Print i
            Exit For
        End If
    Next
End Sub

Using sample data in sheet:

data

QHarr
  • 83,427
  • 12
  • 54
  • 101
  • I want to find the row and slice the row out, not columns... I'm not sure if this would qualify for what I'm looking for. – Maldred Sep 13 '18 at 16:08
  • Your wording was "How would I search an entire (theoretical) column " - so definitely rows? I can easily change to rows. – QHarr Sep 13 '18 at 16:10
  • @QHarr - Woluldn't "remove duplicates" be much easier? I'm not clear on the details - you might have to run it 3 times each time with one column selected.. or maybe i totally misunderstand? – ashleedawg Sep 13 '18 at 16:14
  • My apologies, I have edited the question to be more specific to what I am looking for – Maldred Sep 13 '18 at 16:14
  • You can use Count version,. I think, and if count = 4 or whatever the magic number is then don't add. – QHarr Sep 13 '18 at 16:19
  • 1
    @ashleedawg I no longer know. If OP can use the above great. Otherwise, I can delete and go grab a . – QHarr Sep 13 '18 at 16:22
  • @QHarr I will try the Count version and see what I can get, that seems to be the best solution. I will get back to you once I get it working. Thanks for the help! – Maldred Sep 13 '18 at 16:42
  • @Maldred - referring to the above comment: did you try the proposed approaches and get them working? So you could help other readers to identify helpful solutions by accepting and/or upvoting :-) – T.M. Sep 29 '18 at 10:42
0

Alternative using advanced Index function

This approach using a (late bound) dictionary should be helpful, if your data rows don't exceed the number of 65536. You'll get a 2-dim (1-based) array v with the unique data set of columns A,B and D.

In this example code results are written back to e.g. columns F:H and values of column C are omitted; if you want to maintain these values see ► Edit below.

Example code (omitting column C in resulting array)

Sub getUniqueRows()
Dim dict As Object, v, i&, ii&, n&, currRow$
Set dict = CreateObject("Scripting.Dictionary")         ' late binding dictionary
With ThisWorkbook.Worksheets("MySheet")                 ' << change to your sheet name
  n = .Cells(.Rows.Count, "A").End(xlUp).Row - 1     ' n items (omitting header line)
' [1] get data
  v = .Range("A2:D" & n + 1).Value
' [2a]remove column C (i.e. allow columns 1, 2 and 4 only)
  v = Application.Index(v, Evaluate("row(1:" & n & ")"), Array(1, 2, 4))
' [2b] check for unique ones
  For i = 1 To n
     currRow = Join(Application.Index(v, i, 0), ",") ' build string of cells A,B & D
     If Not dict.Exists(currRow) Then dict.Add currRow, i
  Next i
' [3] remove duplicate rows
  v = Application.Transpose(Application.Index(v, dict.Items, Evaluate("row(1:" & 3 & ")")))
' [4] write data to any wanted range
  .Range("F:H") = ""                                 ' clear rows
  .Range("F2").Resize(UBound(v), 3) = v              ' write data
End With
Set dict = Nothing
End Sub

Note

The dict.Items collection in section [3] is an array of all found item numbers in the dictionary and allows the Index function to get only these items.

Additional links

See Insert new first column in datafield array without loops or API call

Edit - maintain values in column C

Due to comment: "ONLY using columns A, B, and D; Column C was not including in the criteria."

If you want to check values only in A,B and D, but maintain the C values in the resulting array you can use the following optimized code neglecting an empty values row.

Sub getUniqueRows2()
Dim dict As Object, v, i&, n&, j&, currRow$
Set dict = CreateObject("Scripting.Dictionary")          ' late binding dictionary
With ThisWorkbook.Worksheets("MySheet")                  ' << change to your sheet name
     n = .Cells(.Rows.Count, "A").End(xlUp).Row - 1      ' items counter  (omitting header line)
   ' [1] get data
     v = .Range("A2:D" & n + 1).Value
   ' [2] check for unique ones
     For i = 1 To UBound(v)
       ' assign ONLY criteria of 1st, 2nd & 4th column to string value currRow
         currRow = ""
         For j = 0 To 2: currRow = currRow & v(i, Array(1, 2, 4)(j)) & ",": Next j
       ' add first unique occurrence to dictionary
         If Not dict.Exists(currRow) Then                 ' add first occurrence
             If Len(currRow) > 3 Then dict.Add currRow, i ' ... and ignore empty values
         End If
     Next i
   ' [3] remove duplicate rows
     v = Application.Transpose(Application.Index(v, dict.Items, Evaluate("row(1:" & 4 & ")")))
   ' [4] write resulting array values anywhere, e.g. to columns F:I
     .Range("F:I") = ""                                   ' clear rows
     .Range("F2").Resize(UBound(v), 4) = v                ' write data
End With
Set dict = Nothing
End Sub
T.M.
  • 9,436
  • 3
  • 33
  • 57