0

I currently trying to clean up a large dataset using Excel VBA. The dataset structure looks like this.

enter image description here

However, I would like to make it look like this instead, whereby if the cells in columns A:D all contain the same value, transpose the cells in column E. (And remove the duplicated cells from A:D)

enter image description here

Here is the code I did

Dim ws As Worksheet: Set ws = Sheets("test")
lastrow = ws.Cells(ws.Rows.Count, "A").End(xlUp).Row

Dim j As Integer
j = 6

For i = 2 To lastrow

    If (Range("A" & i).Value = Range("A" & i + 1).Value) And (Range("B" & i).Value = Range("B" & i + 1).Value) And (Range("C" & i).Value = Range("C" & i + 1).Value) Then
        Cells(i, j).Value = Cells(i + 1, 5).Value
        j = j + 1
    End If
    
    'Reset J back to 6 if columns A to D does not match previous
    If (Range("A" & i).Value <> Range("A" & i + 1).Value) Or (Range("B" & i).Value <> Range("B" & i + 1).Value) Or (Range("C" & i).Value <> Range("C" & i + 1).Value) Then
        j = 6
    End If
    
Next i

How can this be done?

Pᴇʜ
  • 56,719
  • 10
  • 49
  • 73
ta.ng
  • 55
  • 1
  • 8
  • Is the data always sorted like all `ABCD` come in one block and then `DEFG` starts or an they be mixed? – Pᴇʜ Aug 12 '21 at 14:49
  • It is always sorted in one block. – ta.ng Aug 12 '21 at 14:54
  • However, it can also be `ABCD` followed by `XYCD` (whereby only cells in columns A:B have different values) but I only want to transpose when cells in all 4 columns (from A:D) contain the same values. – ta.ng Aug 12 '21 at 15:09

3 Answers3

4

This ended up more complex than I'd thought but seems to work OK

Sub Compact()

    Const KEY_COLS As Long = 4
    Dim ws As Worksheet, i As Long, k As String, nextEmpty As Long
    Dim dict As Object
    Set dict = CreateObject("scripting.dictionary")
    
    Set ws = Sheets("test")
    For i = 1 To ws.Cells(ws.Rows.Count, "A").End(xlUp).Row
        'create a row "key" from first KEY_COLS cells
        k = Join(Application.Transpose(Application.Transpose( _
                  ws.Cells(i, 1).Resize(1, KEY_COLS))), "~~")
        
        If Not dict.exists(k) Then
            'move this row up?
            If nextEmpty > 0 Then
                ws.Cells(i, 1).Resize(1, KEY_COLS + 1).Cut ws.Cells(nextEmpty, 1)
                dict.Add k, nextEmpty 'new key - store row#
                nextEmpty = 0
            Else
                dict.Add k, i 'new key - store row#
            End If
        Else
            'seen this key before - move value to that row and clear
            ws.Cells(dict(k), Columns.Count).End(xlToLeft).Offset(0, 1).Value = _
                ws.Cells(i, KEY_COLS + 1).Value
            ws.Rows(i).ClearContents
            If nextEmpty = 0 Then nextEmpty = i 'available row
        End If
    Next i
End Sub

Edit: this is a bit cleaner I think. It's split into separate "read" and "write" parts.

Sub Compact2()

    Const KEY_COLS As Long = 4
    Const SEP As String = "~~"
    Dim ws As Worksheet, i As Long, k, col As Long, v
    Dim dict As Object
    Set dict = CreateObject("scripting.dictionary")
    
    Set ws = Sheets("test")
    'collect all the unique combinations and associated values 
    For i = 1 To ws.Cells(ws.Rows.Count, "A").End(xlUp).Row
        'create a row "key" from first KEY_COLS cells
        k = Join(Application.Transpose(Application.Transpose( _
                  ws.Cells(i, 1).Resize(1, KEY_COLS))), SEP)
        
        If Not dict.exists(k) Then dict.Add k, New Collection
        dict(k).Add ws.Cells(i, KEY_COLS + 1).Value
        ws.Rows(i).ClearContents 'clear row
    Next i
    
    're-populate the sheet from the dictionary
    i = 1
    For Each k In dict
        ws.Cells(i, 1).Resize(1, KEY_COLS).Value = Split(k, SEP)
        col = KEY_COLS + 1
        For Each v In dict(k)
            ws.Cells(i, col) = v
            col = col + 1
        Next v
        i = i + 1
    Next k
End Sub
Tim Williams
  • 154,628
  • 8
  • 97
  • 125
  • Cool answer as always (I think I upvote more competing answer from you than anyone else). If you have time, can you take a look at my answer, and see if you can get my solution down to a dynamic formula in column m? Irks me I can't figure it out. I may post a question on it. – pgSystemTester Aug 12 '21 at 16:04
  • @pgSystemTester - I can take a look later but that formula stuff is mostly beyond me... I only just got the version with that capability so I'm a bit behind. – Tim Williams Aug 12 '21 at 16:07
  • No worries. I'll post it as a question and someone could point out what I'm doing wrong. BigBen or Peh have a history of answering my questions in like 55 seconds and making me feel like I should have known better. – pgSystemTester Aug 12 '21 at 16:08
  • Posted question here: https://stackoverflow.com/questions/68761456/converting-multidimensional-arrays-into-a-spill-range – pgSystemTester Aug 12 '21 at 16:54
1

Agree with the Tim Williams this is tricky. I got sort of close to a solution without using VBA in this worksheet (requires spill range enabled). I didn't get a dynamic formula to spill down for the numeric values, but you could make a macro to drag it or something.

See this spreadsheet.

You would need the below formula in cell i1

=UNIQUE(FILTER(A:D,NOT(ISBLANK((A:A)))))

The following formula would be in M1, and dragged down to match the respective columns to the immediate left. You could setup a macro that actually did this for you on a change event. There's probably a way to make this dynamic with an array formula, but I couldn't assemble it in time I tinkered with it.

=TRANSPOSE(FILTER(E:E,(NOT(ISBLANK(E:E))*(A:A&B:B&C:C&D:D=I1&J1&K1&L1))))

Again if you don't have excel spill range capabilities, this won't work. To view with spill range, checkout the excel file via a web browser so it looks like the below image. The gray cells contain the respective formulas.

Sample of browser result

pgSystemTester
  • 8,979
  • 2
  • 23
  • 49
  • I posted a question regarding my own answer on this here: https://stackoverflow.com/questions/68761456/converting-multidimensional-arrays-into-a-spill-range – pgSystemTester Aug 12 '21 at 16:55
1

You can do this pretty easily using Power Query

  • Group by the first four columns
  • Aggregate the 5th column into a delimiter (semicolon) separated text string.
  • Split the delimited string into new columns

For the example, I added some rows where the four columns didn't match

To use Power Query

  • Select some cell in your Data Table
  • Data => Get&Transform => from Table/Range
  • When the PQ Editor opens: Home => Advanced Editor
  • Make note of the Table Name in Line 2
  • Paste the M Code below in place of what you see
  • Change the Table name in line 2 back to what was generated originally.
  • Read the comments and explore the Applied Steps to understand the algorithm

M Code

let
    Source = Excel.CurrentWorkbook(){[Name="Table17"]}[Content],

//set type for all columns as Text
    #"Changed Type" = Table.TransformColumnTypes(Source,List.Transform(Table.ColumnNames(Source), each {_, Text.Type})),

//group by first four columns, then aggregate the 5th column semicolon separated
    #"Grouped Rows" = Table.Group(#"Changed Type", {"Column1", "Column2", "Column3", "Column4"}, {
        {"ColE", each Text.Combine([Column5],";"), Text.Type}
    }),

//split the aggregated text into new columns
//may need to edit this step depending on maximum number in the group
    #"Split Column by Delimiter" = Table.SplitColumn(#"Grouped Rows", "ColE", 
        Splitter.SplitTextByDelimiter(";", QuoteStyle.Csv), {"ColE.1", "ColE.2", "ColE.3"}),
    #"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter",{
        {"ColE.1", Int64.Type}, {"ColE.2", Int64.Type}, {"ColE.3", Int64.Type}})
in
    #"Changed Type1"

enter image description here

Ron Rosenfeld
  • 53,870
  • 7
  • 28
  • 60