1

I have an Excel with the following structure which I want to convert to another structure:

Tranpose Year from Columns to Rows

The actual file is much more complex than this - but I created this schematic to describe the essence of the problem. The file currently has around 5K rows, but it is expected to contain around 50K-100K entries. So, the solution should have a good performance.

What I have in mind is

  1. Copy columns from Customer Name to Unit Price and Year 1 Quantity & Year 1 TotalCost and paste it in the destination range and add a column year number and populate it with 1
  2. Copy columns from Customer Name to Unit Price and Year 2 Quantity & Year 2 TotalCost and populate year number column with 2

The questions I have are these:

  1. Will this solution perform well?
  2. Are there other solutions that avoids copying and pasting multiple times?
  3. Is there a way to update the source range itself without pasting the data in a new destination range?

Homework I have done:

I did a google search and tried to read as many article as I could. I also read the following threads in Stackoverflow, but none of them has the answer that I am looking for

Converting Excel rows to columns (smarter than transpose)

Transpose multiple rows to multiple columns

Excel Converting rows to columns with groups

Community
  • 1
  • 1
Siraj Samsudeen
  • 1,624
  • 7
  • 26
  • 35
  • Is this a one time reorganise of the data, (ie once done will you keep the output data only) or do you want to keep the input data, and keep the output up to date as new data is added to input? – chris neilsen Sep 27 '12 at 10:11
  • @chrisneilsen - After we reorganize, I need only the output data. But this is a periodic exercise that can be run with many datasets. – Siraj Samsudeen Sep 27 '12 at 11:58
  • for anything that (*50K-100K*) size you should really consider using a database instead of a spreadsheet. –  Nov 11 '13 at 16:14

2 Answers2

1

I think you should consider transferring this application to Access or other database. The rest of this answer assumes that this is not possible at this time.

The approach you are considering has a slow loop:

With Worksheets("Input")
  Cut
End With
With Worksheets("Output")
  Paste
End With

I would :

  • upload the entire UsedRange from Worksheet Input to Array1
  • analyse Array1 to determine the size of Worksheet Output
  • create an appropriately sized Array2
  • move data from Array1 to Array2
  • download Array2 to worksheet Output.

If you need example code, I am happy to provide some. I could code for your example sheet but some characteristics of the real worksheets would give you more useful code for little extra effort on my part.

Part 2

You say "The actual file is much more complex than this - but I created this schematic to describe the essence of the problem."

I have assumed:

  • The columns not linked to a particular year are on the left.
  • For each year there are the same columns in the same sequence.
  • All header cells have the same foreground and background colours and the same single bold state.
  • Horizontal alignment of data cells is the default for the data type.
  • There need not be data for every year for every customer/product combination.
  • The number formats for the first data row may be applied to all rows.
  • The value in row 1 of the first column of a block of year columns may be used for the year column in the output.

I created worksheet Input and created the 20 data rows. I duplicated data rows 3 to 22 down to create 5,000 data rows. I assume this is a fair representation of your data:

Sample input data

The output from the macro is in worksheet Output:

Output from macro for sample input data

This is what I believe you seek. I have rearranged the data as specified. I have copied the format of the header rows, the column widths and the number formats across. If you have formulae in the input they will be values in the output.

For 5,000 rows the macro takes about .1 seconds to copy the data and about .05 seconds to apply the formatting.

In the code I have included comments to say what I am doing and why I am doing it but there are not many comments explaining the VBA statements. For example the first statement is Option Explicit. It is easy to look this up in VB Help or you can search the internet for "Excel VBA Option Explicit". Come back with questions if necessary.

Hope this helps.

Option Explicit
Sub Reformat()

  Dim CellHeaderColourBack As Long
  Dim CellHeaderColourFore As Long
  Dim CellHeaderBold As Boolean
  Dim CellInValue() As Variant
  Dim CellOutHeaderHAlign() As Long
  Dim CellOutNumberFormat() As String
  Dim CellOutValue() As Variant
  Dim ColInCrnt As Long
  Dim ColInCrnt2 As Long
  Dim ColInMax As Long
  Dim ColOutCrnt As Long
  Dim ColOutMax As Long
  Dim ColWidth() As Single
  Dim NumRowsData As Long
  Dim RowInCrnt As Long
  Dim RowInMax As Long
  Dim RowOutCrnt As Long
  Dim RowOutMax As Long
  Dim TimeStart As Single

  ' I use constants to define values that might change.  For example, you have
  ' two header rows so the first data row is 3.
  ' "For RowCrnt = RowDataFirst to RowMax" instead of
  ' "For RowCrnt = 3 to RowMax"
  ' makes the code easier to understand and makes it easy to update the code
  ' if you add another header row.
  Const RowDataFirst As Long = 3   ' First data row
  Const NumNonYearCols As Long = 4 ' Number of columns not linked to a year
  Const NumColsPerYear As Long = 2 ' Number of columns per year

  TimeStart = Timer     ' Seconds since midnight

  With Worksheets("Input")

    ' There are several ways of identifying the last column and the last row.
    ' None work in every situation.  I think this method should be satisfactory
    ' for your worksheet although there is a warning later about ColMax.
    ColInMax = .Cells.SpecialCells(xlCellTypeLastCell).Column
    RowInMax = .Cells.SpecialCells(xlCellTypeLastCell).Row

    ' Debug.Print output to the Immediate Window.  I have left diagnostic
    ' outputs within the code.  Delete once you have adapted the code to
    ' your requirements.
    Debug.Print "ColInMax=" & ColInMax & "  RowInMax=" & RowInMax

    ' I never did much programming in C++ or Java but I never used a language
    ' that did not have an Assert statement of some kind.
    ' A key assumption of the code is that the the number of columns is of the
    ' form: NumNonYearCols + NunYears * NumColsPerYear.
    ' The interpreter will stop on this statement if this assumption is untrue.
    ' If the interpreter does stop even though you think the assumption is true,
    ' you will probably have a stray value or formatted cell to the right of the
    ' main data table.  Try deleting columns to the right of the data table.
    ' Alternatively, set ColInMax = NumNonYearCols + NumYears * NumColsPerYear
    ' so the extract ignores anything outside the data table.
    Debug.Assert (ColInMax - NumNonYearCols) Mod NumColsPerYear = 0

    ' Load all values within the worksheet to the array CellValue.
    CellInValue = .Range(.Cells(1, 1), .Cells(RowInMax, ColInMax)).Value
    ' CellInValue will now be a two dimensional array.  Dimension 1 will be for
    ' rows and dimension 2 will be for columns.  This is not conventional for
    ' arrays but matches the VBA for accessing cells.
    ' The lower bound for both dimensions will be 1.

    ' Record the formatting of cell A1 so this can be applied to all header
    ' cells in worksheet Output.  If the formatting is more complicated than
    ' this, it will probably be easier to copy and paste the header rows from
    ' the input to the output worksheet.
    With .Cells(1, 1)
      CellHeaderColourBack = .Interior.Color
      CellHeaderColourFore = .Font.Color
      ' Warning the bold state of a cell will be non-boolean if
      ' some characters are bold and some are not.
      CellHeaderBold = .Font.Bold
    End With

    ' Calculate number of columns in worksheet Output
    ColOutMax = NumNonYearCols + 1 + NumColsPerYear

    ' Record column widths and number formats for first data row and horizontal
    ' alignment for last header row.
    ' The column widths will be applied to the relevant output columns
    ' The number formats will be applied to data cells in the relevant
    ' output column.
    ' The horizontal alignments  will be applied to header cells in the
    ' relevant output column.
    ReDim ColWidth(1 To ColOutMax)
    ReDim CellOutNumberFormat(1 To ColOutMax)
    ReDim CellOutHeaderHAlign(1 To ColOutMax)

    ColOutCrnt = 1
    ' Non-year-linked columns
    For ColInCrnt = 1 To NumNonYearCols
      ColWidth(ColOutCrnt) = .Columns(ColInCrnt).ColumnWidth
      CellOutNumberFormat(ColOutCrnt) = _
                                 .Cells(RowDataFirst, ColInCrnt).NumberFormat
      CellOutHeaderHAlign(ColOutCrnt) = _
                      .Cells(RowDataFirst - 1, ColInCrnt).HorizontalAlignment
      ColOutCrnt = ColOutCrnt + 1
    Next
    ' Year column
    ColWidth(ColOutCrnt) = 5
    CellOutNumberFormat(ColOutCrnt) = "General"
    CellOutHeaderHAlign(ColOutCrnt) = xlRight
    ColOutCrnt = ColOutCrnt + 1
    ' Year-linked columns
    For ColInCrnt = NumNonYearCols + 1 To NumNonYearCols + NumColsPerYear
      ColWidth(ColOutCrnt) = .Columns(ColInCrnt).ColumnWidth
      CellOutNumberFormat(ColOutCrnt) = _
                                 .Cells(RowDataFirst, ColInCrnt).NumberFormat
      CellOutHeaderHAlign(ColOutCrnt) = _
                      .Cells(RowDataFirst - 1, ColInCrnt).HorizontalAlignment
      ColOutCrnt = ColOutCrnt + 1
    Next

  End With

  ' I have now extracted everything I want from worksheet Input.

  ' Worksheet Output will have 1 data row per value in a Quantity column.
  ' Count these values.
  NumRowsData = 0
  For RowInCrnt = RowDataFirst To RowInMax
    For ColInCrnt = NumNonYearCols + 1 To ColInMax Step NumColsPerYear
      If CellInValue(RowInCrnt, ColInCrnt) <> "" Then
        NumRowsData = NumRowsData + 1
      End If
    Next
  Next

  Debug.Print NumRowsData

  ' Size CellOutValue so it can hold all the data for Worksheet Output.
  ' ColOutMax = NumNonYearCols + 1 + NumColsPerYear   ' Calculated earlier
  RowOutMax = RowDataFirst - 1 + NumRowsData
  ReDim CellOutValue(1 To RowOutMax, 1 To ColOutMax)

  ' Build new header rows.

  ' Copy header cells for non-year-linked columns
  RowOutCrnt = 1
  For RowInCrnt = 1 To RowDataFirst - 1
    ColOutCrnt = 1
    For ColInCrnt = 1 To NumNonYearCols
      CellOutValue(RowOutCrnt, ColOutCrnt) = CellInValue(RowInCrnt, ColInCrnt)
      ColOutCrnt = ColOutCrnt + 1
    Next
    RowOutCrnt = RowOutCrnt + 1
  Next

  ' Create header for new column
  CellOutValue(RowDataFirst - 1, ColOutCrnt) = "Year"

  ' Copy one set of year-linked column header cells
  RowOutCrnt = 2        ' Row 1 holds year numbers
  For RowInCrnt = 2 To RowDataFirst - 1
    ColOutCrnt = NumNonYearCols + 2
    For ColInCrnt = NumNonYearCols + 1 To NumNonYearCols + NumColsPerYear
      CellOutValue(RowOutCrnt, ColOutCrnt) = _
                                           CellInValue(RowInCrnt, ColInCrnt)
      ColOutCrnt = ColOutCrnt + 1
    Next
    RowOutCrnt = RowOutCrnt + 1
  Next

  ' Copy data
  RowOutCrnt = RowDataFirst
  For RowInCrnt = RowDataFirst To RowInMax
    For ColInCrnt = NumNonYearCols + 1 To ColInMax Step NumColsPerYear
      ' This for-loop tracks the first column of each block of year columns
      If CellInValue(RowInCrnt, ColInCrnt) <> "" Then
        ' There is data for this year for this customer/product

        ' Copy non-year-linked data
        ColOutCrnt = 1
        For ColInCrnt2 = 1 To NumNonYearCols
          CellOutValue(RowOutCrnt, ColOutCrnt) = _
                                            CellInValue(RowInCrnt, ColInCrnt2)
          ColOutCrnt = ColOutCrnt + 1
        Next

        ' Copy year
        CellOutValue(RowOutCrnt, ColOutCrnt) = CellInValue(1, ColInCrnt)
        ColOutCrnt = ColOutCrnt + 1

        ' Copy year-linked data
        For ColInCrnt2 = ColInCrnt To ColInCrnt + NumColsPerYear - 1
          CellOutValue(RowOutCrnt, ColOutCrnt) = _
                                            CellInValue(RowInCrnt, ColInCrnt2)
          ColOutCrnt = ColOutCrnt + 1
        Next
        RowOutCrnt = RowOutCrnt + 1
      End If
    Next
  Next

  With Worksheets("Output")

    ' Delete any existing value
    .Cells.EntireRow.Delete

    ' Download contents of CellOutValue
    .Range(.Cells(1, 1), .Cells(RowOutMax, ColOutMax)).Value = CellOutValue

    'Set formatting.  Selection formats from the input worksheet were saved at
    ' the beginning.  Applying these formats to the output worksheet is not
    ' necessary but makes the process a little smoother.
    For RowOutCrnt = 1 To RowDataFirst - 1
      For ColOutCrnt = 1 To ColOutMax
        With .Cells(RowOutCrnt, ColOutCrnt)
          .Interior.Color = CellHeaderColourBack
          .Font.Color = CellHeaderColourFore
          .Font.Bold = CellHeaderBold
          .HorizontalAlignment = CellOutHeaderHAlign(ColOutCrnt)
        End With
      Next
    Next
    For ColOutCrnt = 1 To ColOutMax
      .Columns(ColOutCrnt).ColumnWidth = ColWidth(ColOutCrnt)
      .Range(.Cells(RowDataFirst, ColOutCrnt), _
             .Cells(RowOutMax, ColOutCrnt)).NumberFormat _
                                            = CellOutNumberFormat(ColOutCrnt)
    Next

  End With

  Debug.Print "Duration " & Timer - TimeStart

End Sub
Tony Dallimore
  • 12,335
  • 7
  • 32
  • 61
  • Thanks a lot for the detailed answer. I am a beginner to VBA and it would really help me if I see some sample code. I have a good experience with other languages like C++ and Java in the past. Hence, I can improvise if I see a sample. Can you help me with the Array implementation logic as that is the bit that I am not clear how to go about it - I read about it and considered it as a possible solution, but did not know how to go about it. Thanks again Tony for your time and efforts to help me. – Siraj Samsudeen Sep 27 '12 at 19:12
  • It is 21:30 here. I will get something together for you but I doubt I will finish coding this evening. – Tony Dallimore Sep 27 '12 at 20:23
  • No worries Tony. Plz take your time. I am grateful for your help – Siraj Samsudeen Sep 28 '12 at 01:28
  • thanks a lot Tony for your awesome effort. I am traveling and hence not able to read the code properly in the small screen of my iphone. I will try to go over the code and raise questions if I don't understand anything. I am very impressed with the amount of effort you have taken to write such good code with such a comprehensive code documentation. Is it possible for you to share the sample file with the macro code inside it so that I can step through it? – Siraj Samsudeen Sep 29 '12 at 15:20
  • Welcome. Most of the effort went into the documentation. The rearrangement of data is, for me, quite routine. If you go to my profile you will find an email address. If you send me an email, I will respond with a copy of the workbook in which I developed the code. – Tony Dallimore Sep 29 '12 at 15:33
0

I would use another approach: I would just add a the missing columns and delete the undesired ones.

Many of my macros start by creating or importing a table with all the data, then duplicate it, sort it, remove the undesired rows at the top or at the bottom (if the sort is smart enough, all the desired rows will be grouped together), remove the undesired columns, add a few columns, format, and repeat on all the sheets.

stenci
  • 8,290
  • 14
  • 64
  • 104