0

Currently have an input file of the format shown below

enter image description here

The expected output should

enter image description here

How to achieve this thru Excel?

Sweety
  • 75
  • 1
  • 2
  • 13

4 Answers4

2

If you have Office 365 then you can try below formula. Otherwise using VBA is better bet.

=IFERROR(FILTERXML("<t><d>"&TEXTJOIN("",TRUE,IF($A$2:$A$6<>"",$A$1&"</d><d>"&$A$2:$A$6&"</d><d>",""),IF($B$2:$B$6<>"",$B$1&"</d><d>"&$B$2:$B$6&"</d><d>",""),IF($C$2:$C$6<>"",$B$1&"</d><d>"&$C$2:$C$6&"</d><d>",""),IF($D$2:$D$6<>"",$D$1&"</d><d>"&$D$2:$D$6&"</d><d>",""))&"</d></t>","//d["&ROWS($A$1:$A1)&"]"),"")

First portion of this formula is IF where we build a concatenated expression for each column which builds up data in PIN1</d><d>main1.txt</d><d> form:

IF($A$2:$A$6<>"",$A$1&"</d><d>"&$A$2:$A$6&"</d><d>","")

All 4 columns are joined together by TEXTJOIN formula to build a valid XML data. And then we extract using the FILTERXML formula. This may seem a little complicated to begin with but it is fairly straightforward once you read information in below link:

Excel - Extract substring(s) from string using FILTERXML

shrivallabha.redij
  • 5,832
  • 1
  • 12
  • 27
  • Thank you for the solution. I was able to display the col name after each data using IF($A$2:$A$6<>"",$A$1&""&$A$2:$A$6&"",""). However when the entire function was used (FILTERXML ...) , only PIN1 was displayed. – Sweety Dec 29 '20 at 11:50
  • @Sweety Sorry, I did not understand your comment. – shrivallabha.redij Dec 29 '20 at 17:06
  • In the actual data set, the column range is from A...BOI (its huge data set). I applied the formula you had given me with minor amendments (added few more cols beyond col D) on the formula. However when I executed the formula, only the first column name (PIN1) was shown. The rest of the data was not shown. – Sweety Dec 30 '20 at 01:18
2

Here's a VBA solution to your problem. It can handle any number of columns and any number of rows. It would require modification if your columns aren't blank below the significant data shown in your sample.

Sub ColumnsToList()
    ' 148

    Dim Arr     As Variant              ' data array
    Dim Fun     As Variant              ' output array
    Dim C       As Long                 ' loop counter: columns
    Dim R       As Long                 ' loop counter: rows
    Dim i       As Long                 ' Fun index
    
    With ActiveSheet.UsedRange
        Arr = .Value
        ReDim Fun(1 To .Cells.Count * 2)
    End With
    
    For C = 1 To UBound(Arr, 2)
        For R = 2 To UBound(Arr)
            If Len(Arr(R, C)) Then
                i = i + 2
                Fun(i) = Arr(R, C)
                Fun(i - 1) = Arr(1, C)
            End If
        Next R
    Next C
    
    If i Then
        ReDim Preserve Fun(1 To i)
        With ActiveSheet
            ' this specifies the first empty column on the source sheet for output
            ' specify another cell like this:-
            ' Worksheets("Sheet1").cells(1, "A").Resize( ... continue as below
            .Cells(1, .UsedRange.Columns.Count + 1).Resize(UBound(Fun)).Value = Application.Transpose(Fun)
        End With
    End If
End Sub

Edit 30 Dec 2020 In response to the claim that the above code returns #N/A errors from column 1750 onward I used the code below to create a set of data which I believe might be similar to the actual data you use.

Private Sub CreateData()
    ' 148

    Dim C       As Long
    Dim R       As Long
    Dim L       As Integer
    
    Application.ScreenUpdating = False
    For C = 1 To 5000
        Cells(1, C).Value = "PIN " & C
        For R = 2 To Int(4 * Rnd) + 2
            With Cells(R, C)
                .Value = .Address(0, 0)
            End With
        Next R
    Next C
    Application.ScreenUpdating = True
End Sub

I then ran my above procedure ColumnsToList on the data thus created. I was amazed at the speed with which more than 25000 rows were produced, instantly and without any errors.

Variatus
  • 14,293
  • 2
  • 14
  • 30
  • Thank you for the script. I tested this and after some column onwards I could see #NA. There are closer to 5000 values. I executed the code as it is. The last data is in column GOI and from col BOH onwards the data was shown as #NA. This is the first time using VBA. – Sweety Dec 29 '20 at 12:01
  • I was unable to reproduce the error you describe. Please look at the test I conducted in my answer above. Therefore I suggest you look at your data starting in column BOH. They may be different. In fact, they should themselves contain the #N/A error. If this isn't the case please look at your Excel version. I'm using Excel 365 and your version may have limits to the size of arrays I don't have. Consider splitting your data into blocks of less than 1750 columns in that case, which you can process separately and combine the results. – Variatus Dec 30 '20 at 00:19
1

Here is a Power Query solution (available in Excel 2010+).

  • It should adapt to any changes in number of rows or columns.
  • It should also ignore blank entries in the table.

See the comments in the code for the algorithm, and explore the Applied Steps window to see what happens at each step.

To open the PQ editor, in later versions of Excel

  • select some cell in the data table
  • Data => Get & Transform => From Table/Range
  • Be sure to change the Table name in line 4 to match the real name in your workbook

You can then paste the code into the Advanced Editor accessible from the HOME / Query tab of the UI.

M Code

let

//Change table name in next line to match the REAL table name in your workbook
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],

//Unpivot all the columns to generate a two column table    
    #"Unpivoted Columns" = Table.UnpivotOtherColumns(Source, {}, "Attribute", "Value"),

//Sort by Attribute (column Header), then by Value (column data)
//May need to create a Custom Sort for the data if it does not sort readily into what you want
    #"Sorted Rows" = Table.Sort(#"Unpivoted Columns",{{"Attribute", Order.Ascending}, {"Value", Order.Ascending}}),

//combine the two columns into an alternating List of Header/Data
        zipList = List.Zip({Table.Column(#"Sorted Rows","Attribute"),
                            Table.Column(#"Sorted Rows","Value")}),
    #"Converted to Table" = Table.FromList(zipList, Splitter.SplitByNothing(), null, null, ExtraValues.Error),

//Expand the list into new rows
    #"Expanded Column1" = Table.ExpandListColumn(#"Converted to Table", "Column1")
    
in
    #"Expanded Column1"

enter image description here

Ron Rosenfeld
  • 53,870
  • 7
  • 28
  • 60
  • Thank you Ron. The script worked and should I keep scrolling to access 50,000 records ? I manually keep scrolling to check the data. The option "Copy Entire Table" does not give the entire data sets. – Sweety Dec 29 '20 at 13:47
  • @Sweety not sure I understand exactly what you mean. Just save and load to a worksheet and examine that. If you mean something else, please clarify. Where are you seeing this copy entire table option? – Ron Rosenfeld Dec 29 '20 at 14:20
  • initially when I executed this script, I could see only 2000 records. I saved the worksheet and now I can see the entire data sets . Thank you for the suggestion. – Sweety Dec 30 '20 at 01:09
0

This is a very fiddly problem to do in Excel efficiently... but I have an inefficient solution for you.

If you know the range you're extracting from, this will provide you with the right answer:

=LET(
     InputRange, Sheet1!A1:D7,

     InputColumnCount, COLUMNS(InputRange),
     CheckIsBlank_FullRange, 1-ISBLANK(InputRange),
     InputRowCount, ROWS(InputRange)-1,

     RowSequence, SEQUENCE(InputRowCount*InputColumnCount*2,,0),
     RowIndexSkip, MOD(RowSequence,2),
     RowSequenceSkip, (RowSequence-RowIndexSkip)/2,
     RowIndex, MOD(RowSequenceSkip,InputRowCount)+2,
     ColumnIndex, (RowSequenceSkip-MOD(RowSequenceSkip,InputRowCount))/InputRowCount+1,

     UnfilteredResult,
     IF(
          RowIndexSkip=0,
          INDEX(InputRange,,ColumnIndex),
          INDEX(InputRange,RowIndex,ColumnIndex)
      ),

     FILTER(UnfilteredResult,INDEX(CheckIsBlank_FullRange,RowIndex,ColumnIndex))
)

So, what is this doing? I'm using a LET function to make it a bit more readable.

The first steps are to take in the input range, and calculate how many rows and columns we have, and store those in variables (InputRowCount and InputColumnCount, respectively):

=LET(
     InputRange, Sheet1!A1:D7,

     InputColumnCount, COLUMNS(InputRange),
     CheckIsBlank_FullRange, 1-ISBLANK(InputRange),
     InputRowCount, ROWS(InputRange)-1,

The next step will be to design a number of ranges to help us. Excel now handles dynamic ranges, so, we can generate some ranges to help us pick out the right data

I use a lot of MOD functions. MOD is basically the remainder after division, and if you take the MOD of a default SEQUENCE, the result will be a list of numbers that increase from 0 to one less than the divisor, and then it will go back to 0 and rise again. So, if I were to take MOD(SEQUENCE(Rows * Columns,, 0), Rows), the resulting list of numbers will increase from 0 to Rows, Column times.

That's what these are for:

RowSequence, SEQUENCE(InputRowCount*InputColumnCount*2,,0) RowIndexSkip, MOD(RowSequence,2) RowSequenceSkip, (RowSequence-RowIndexSkip)/2 RowIndex, MOD(RowSequenceSkip-MOD(RowSequenceSkip,InputRowCount))/InputRowCount + 1) ColumnIndex, (RowSequenceSkip-MOD(RowSequenceSkip,InputRowCount))/InputRowCount+1

RowIndexSkip is basically a list of alternating 0's and 1's, that we're going to use below to track whether we're inserting a header or a data item.

And for the final options, bringing it all together:

     UnfilteredResult,
     IF(
          RowIndexSkip=0,
          INDEX(InputRange,,ColumnIndex),
          INDEX(InputRange,RowIndex,ColumnIndex)
      ),

     FILTER(UnfilteredResult,INDEX(CheckIsBlank_FullRange,RowIndex,ColumnIndex))

The above basically creates a listing of every data item, interleaved with the headers, as you requested. However… we still have to apply a filter, because your original data shows that there can be blank spaces.

If, on the other hand, you only know the columns, but you don't know how many rows, this should do the trick:

=LET(
     InputRange, Sheet1!A:D,

     InputColumnCount, COLUMNS(InputRange),
     CheckIsBlank_FullRange, 1-ISBLANK(InputRange),
     CheckIsBlank_ByRow, MMULT(CheckIsBlank_FullRange,SEQUENCE(COLUMNS(InputRange),,1,0)),
     InputRowCount, MAX(FILTER(SEQUENCE(ROWS(InputRange)),CheckIsBlank_ByRow>0))-1,

     RowSequence, SEQUENCE(InputRowCount*InputColumnCount*2,,0),
     RowIndexSkip, MOD(RowSequence,2),
     RowSequenceSkip, (RowSequence-RowIndexSkip)/2,
     RowIndex, MOD(RowSequenceSkip,InputRowCount)+2,
     ColumnIndex, (RowSequenceSkip-MOD(RowSequenceSkip,InputRowCount))/InputRowCount+1,

     UnfilteredResult,
     IF(RowIndexSkip=0,
          INDEX(InputRange,,ColumnIndex),
          INDEX(InputRange,RowIndex,ColumnIndex)
     ),
     FILTER(UnfilteredResult,INDEX(CheckIsBlank_FullRange,RowIndex,ColumnIndex))
)

This time, you can see that I've slightly changed how it finds the final row.

CheckIsBlank_ByRow, MMULT(CheckIsBlank_FullRange,SEQUENCE(COLUMNS(InputRange),,1,0)),
InputRowCount, MAX(FILTER(SEQUENCE(ROWS(InputRange)),CheckIsBlank_ByRow>0))-1,

That MMULT isn't very efficient, but it will at least ensure that this formula won't be thrown off by blank spaces or columns with different amounts of data.

And finally, if you don't know how many columns OR rows there are, but only know the name of the sheet, this is how I would do it:

=LET(
     InputSheet, "Sheet1",
     InputColumnCount, XMATCH(FALSE,ISBLANK(INDIRECT(InputSheet&"!1:1")),0,-1),

     Column1Counter, MOD(InputColumnCount-1,26)+1,
     Column2Counter, (MOD((InputColumnCount-27-MOD(InputColumnCount-27,26))/26,26)+1)*(InputColumnCount>26),
     Column3Counter, (((InputColumnCount-Column1Counter)/26)-Column2Counter)/26,
     LastColumnLetters,
     IF(Column3Counter=0,"",CHAR(Column3Counter+64))&
     IF(Column2Counter=0,"",CHAR(Column2Counter+64))&
     CHAR(Column1Counter+64),

     LongRange, INDIRECT(InputSheet&"!A:"&LastColumnLetters),

     CheckIsBlank_FullRange, 1-ISBLANK(LongRange),
     CheckIsBlank_ByRow, MMULT(CheckIsBlank_FullRange,SEQUENCE(InputColumnCount,,1,0)),
     InputRowCount, MAX(FILTER(SEQUENCE(ROWS(LongRange)),CheckIsBlank_ByRow>0)),

     InputRange, INDIRECT(InputSheet&"!A1:"&LastColumnLetters&InputRowCount),

     RowSequence, SEQUENCE(InputRowCount*InputColumnCount*2,,0),
     RowIndexSkip, MOD(RowSequence,2),
     RowSequenceSkip, (RowSequence-RowIndexSkip)/2,
     RowIndex, MOD(RowSequenceSkip,InputRowCount)+2,
     ColumnIndex, (RowSequenceSkip-MOD(RowSequenceSkip,InputRowCount))/InputRowCount+1,

     UnfilteredResult,
     IF(RowIndexSkip=0,
          INDEX(InputRange,,ColumnIndex),
          INDEX(InputRange,RowIndex,ColumnIndex)
     ),
     FILTER(UnfilteredResult,INDEX(CheckIsBlank_FullRange,RowIndex,ColumnIndex))
)

This code is even slower and more confusing, because there's so much to be done, but it will check to see that it has every column. It does it with this:

InputColumnCount, XMATCH(FALSE,ISBLANK(INDIRECT(InputSheet&"!1:1")),0,-1),

This code searches the first row of the target worksheet, looking for the last column. It does this with the -1 command in the XMATCH, this tells excel to start looking from the right/bottom, and work up/left.

Then I've included some code for addressing the relevant columns:

Column1Counter, MOD(InputColumnCount-1,26)+1,
Column2Counter, (MOD((InputColumnCount-27-MOD(InputColumnCount-27,26))/26,26)+1)*(InputColumnCount>26),
Column3Counter, (((InputColumnCount-Column1Counter)/26)-Column2Counter)/26,
LastColumnLetters,
IF(Column3Counter=0,"",CHAR(Column3Counter+64))&
IF(Column2Counter=0,"",CHAR(Column2Counter+64))&
CHAR(Column1Counter+64),

You could use an INDEX to generate this, but it will actually slow down quite a bit, because Excel would be essentially loading all 1048576 * ColumnCount cells into memory (and this code already does this once... no need to do it twice).