Currently have an input file of the format shown below
The expected output should
How to achieve this thru Excel?
Currently have an input file of the format shown below
The expected output should
How to achieve this thru Excel?
If you have Office 365 then you can try below formula. Otherwise using VBA is better bet.
=IFERROR(FILTERXML("<t><d>"&TEXTJOIN("",TRUE,IF($A$2:$A$6<>"",$A$1&"</d><d>"&$A$2:$A$6&"</d><d>",""),IF($B$2:$B$6<>"",$B$1&"</d><d>"&$B$2:$B$6&"</d><d>",""),IF($C$2:$C$6<>"",$B$1&"</d><d>"&$C$2:$C$6&"</d><d>",""),IF($D$2:$D$6<>"",$D$1&"</d><d>"&$D$2:$D$6&"</d><d>",""))&"</d></t>","//d["&ROWS($A$1:$A1)&"]"),"")
First portion of this formula is IF
where we build a concatenated expression for each column which builds up data in PIN1</d><d>main1.txt</d><d>
form:
IF($A$2:$A$6<>"",$A$1&"</d><d>"&$A$2:$A$6&"</d><d>","")
All 4 columns are joined together by TEXTJOIN
formula to build a valid XML data. And then we extract using the FILTERXML
formula. This may seem a little complicated to begin with but it is fairly straightforward once you read information in below link:
Here's a VBA solution to your problem. It can handle any number of columns and any number of rows. It would require modification if your columns aren't blank below the significant data shown in your sample.
Sub ColumnsToList()
' 148
Dim Arr As Variant ' data array
Dim Fun As Variant ' output array
Dim C As Long ' loop counter: columns
Dim R As Long ' loop counter: rows
Dim i As Long ' Fun index
With ActiveSheet.UsedRange
Arr = .Value
ReDim Fun(1 To .Cells.Count * 2)
End With
For C = 1 To UBound(Arr, 2)
For R = 2 To UBound(Arr)
If Len(Arr(R, C)) Then
i = i + 2
Fun(i) = Arr(R, C)
Fun(i - 1) = Arr(1, C)
End If
Next R
Next C
If i Then
ReDim Preserve Fun(1 To i)
With ActiveSheet
' this specifies the first empty column on the source sheet for output
' specify another cell like this:-
' Worksheets("Sheet1").cells(1, "A").Resize( ... continue as below
.Cells(1, .UsedRange.Columns.Count + 1).Resize(UBound(Fun)).Value = Application.Transpose(Fun)
End With
End If
End Sub
Edit 30 Dec 2020 In response to the claim that the above code returns #N/A errors from column 1750 onward I used the code below to create a set of data which I believe might be similar to the actual data you use.
Private Sub CreateData()
' 148
Dim C As Long
Dim R As Long
Dim L As Integer
Application.ScreenUpdating = False
For C = 1 To 5000
Cells(1, C).Value = "PIN " & C
For R = 2 To Int(4 * Rnd) + 2
With Cells(R, C)
.Value = .Address(0, 0)
End With
Next R
Next C
Application.ScreenUpdating = True
End Sub
I then ran my above procedure ColumnsToList
on the data thus created. I was amazed at the speed with which more than 25000 rows were produced, instantly and without any errors.
Here is a Power Query solution (available in Excel 2010+).
See the comments in the code for the algorithm, and explore the Applied Steps window to see what happens at each step.
To open the PQ editor, in later versions of Excel
Data
=> Get & Transform
=> From Table/Range
You can then paste the code into the Advanced Editor accessible from the HOME / Query tab of the UI.
M Code
let
//Change table name in next line to match the REAL table name in your workbook
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
//Unpivot all the columns to generate a two column table
#"Unpivoted Columns" = Table.UnpivotOtherColumns(Source, {}, "Attribute", "Value"),
//Sort by Attribute (column Header), then by Value (column data)
//May need to create a Custom Sort for the data if it does not sort readily into what you want
#"Sorted Rows" = Table.Sort(#"Unpivoted Columns",{{"Attribute", Order.Ascending}, {"Value", Order.Ascending}}),
//combine the two columns into an alternating List of Header/Data
zipList = List.Zip({Table.Column(#"Sorted Rows","Attribute"),
Table.Column(#"Sorted Rows","Value")}),
#"Converted to Table" = Table.FromList(zipList, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
//Expand the list into new rows
#"Expanded Column1" = Table.ExpandListColumn(#"Converted to Table", "Column1")
in
#"Expanded Column1"
This is a very fiddly problem to do in Excel efficiently... but I have an inefficient solution for you.
If you know the range you're extracting from, this will provide you with the right answer:
=LET(
InputRange, Sheet1!A1:D7,
InputColumnCount, COLUMNS(InputRange),
CheckIsBlank_FullRange, 1-ISBLANK(InputRange),
InputRowCount, ROWS(InputRange)-1,
RowSequence, SEQUENCE(InputRowCount*InputColumnCount*2,,0),
RowIndexSkip, MOD(RowSequence,2),
RowSequenceSkip, (RowSequence-RowIndexSkip)/2,
RowIndex, MOD(RowSequenceSkip,InputRowCount)+2,
ColumnIndex, (RowSequenceSkip-MOD(RowSequenceSkip,InputRowCount))/InputRowCount+1,
UnfilteredResult,
IF(
RowIndexSkip=0,
INDEX(InputRange,,ColumnIndex),
INDEX(InputRange,RowIndex,ColumnIndex)
),
FILTER(UnfilteredResult,INDEX(CheckIsBlank_FullRange,RowIndex,ColumnIndex))
)
So, what is this doing? I'm using a LET
function to make it a bit more readable.
The first steps are to take in the input range, and calculate how many rows and columns we have, and store those in variables (InputRowCount
and InputColumnCount
, respectively):
=LET(
InputRange, Sheet1!A1:D7,
InputColumnCount, COLUMNS(InputRange),
CheckIsBlank_FullRange, 1-ISBLANK(InputRange),
InputRowCount, ROWS(InputRange)-1,
The next step will be to design a number of ranges to help us. Excel now handles dynamic ranges, so, we can generate some ranges to help us pick out the right data
I use a lot of MOD
functions. MOD
is basically the remainder after division, and if you take the MOD
of a default SEQUENCE
, the result will be a list of numbers that increase from 0 to one less than the divisor, and then it will go back to 0 and rise again. So, if I were to take MOD(SEQUENCE(Rows * Columns,, 0), Rows)
, the resulting list of numbers will increase from 0
to Rows
, Column
times.
That's what these are for:
RowSequence, SEQUENCE(InputRowCount*InputColumnCount*2,,0)
RowIndexSkip, MOD(RowSequence,2)
RowSequenceSkip, (RowSequence-RowIndexSkip)/2
RowIndex, MOD(RowSequenceSkip-MOD(RowSequenceSkip,InputRowCount))/InputRowCount + 1)
ColumnIndex, (RowSequenceSkip-MOD(RowSequenceSkip,InputRowCount))/InputRowCount+1
RowIndexSkip
is basically a list of alternating 0
's and 1
's, that we're going to use below to track whether we're inserting a header or a data item.
And for the final options, bringing it all together:
UnfilteredResult,
IF(
RowIndexSkip=0,
INDEX(InputRange,,ColumnIndex),
INDEX(InputRange,RowIndex,ColumnIndex)
),
FILTER(UnfilteredResult,INDEX(CheckIsBlank_FullRange,RowIndex,ColumnIndex))
The above basically creates a listing of every data item, interleaved with the headers, as you requested. However… we still have to apply a filter, because your original data shows that there can be blank spaces.
If, on the other hand, you only know the columns, but you don't know how many rows, this should do the trick:
=LET(
InputRange, Sheet1!A:D,
InputColumnCount, COLUMNS(InputRange),
CheckIsBlank_FullRange, 1-ISBLANK(InputRange),
CheckIsBlank_ByRow, MMULT(CheckIsBlank_FullRange,SEQUENCE(COLUMNS(InputRange),,1,0)),
InputRowCount, MAX(FILTER(SEQUENCE(ROWS(InputRange)),CheckIsBlank_ByRow>0))-1,
RowSequence, SEQUENCE(InputRowCount*InputColumnCount*2,,0),
RowIndexSkip, MOD(RowSequence,2),
RowSequenceSkip, (RowSequence-RowIndexSkip)/2,
RowIndex, MOD(RowSequenceSkip,InputRowCount)+2,
ColumnIndex, (RowSequenceSkip-MOD(RowSequenceSkip,InputRowCount))/InputRowCount+1,
UnfilteredResult,
IF(RowIndexSkip=0,
INDEX(InputRange,,ColumnIndex),
INDEX(InputRange,RowIndex,ColumnIndex)
),
FILTER(UnfilteredResult,INDEX(CheckIsBlank_FullRange,RowIndex,ColumnIndex))
)
This time, you can see that I've slightly changed how it finds the final row.
CheckIsBlank_ByRow, MMULT(CheckIsBlank_FullRange,SEQUENCE(COLUMNS(InputRange),,1,0)),
InputRowCount, MAX(FILTER(SEQUENCE(ROWS(InputRange)),CheckIsBlank_ByRow>0))-1,
That MMULT
isn't very efficient, but it will at least ensure that this formula won't be thrown off by blank spaces or columns with different amounts of data.
And finally, if you don't know how many columns OR rows there are, but only know the name of the sheet, this is how I would do it:
=LET(
InputSheet, "Sheet1",
InputColumnCount, XMATCH(FALSE,ISBLANK(INDIRECT(InputSheet&"!1:1")),0,-1),
Column1Counter, MOD(InputColumnCount-1,26)+1,
Column2Counter, (MOD((InputColumnCount-27-MOD(InputColumnCount-27,26))/26,26)+1)*(InputColumnCount>26),
Column3Counter, (((InputColumnCount-Column1Counter)/26)-Column2Counter)/26,
LastColumnLetters,
IF(Column3Counter=0,"",CHAR(Column3Counter+64))&
IF(Column2Counter=0,"",CHAR(Column2Counter+64))&
CHAR(Column1Counter+64),
LongRange, INDIRECT(InputSheet&"!A:"&LastColumnLetters),
CheckIsBlank_FullRange, 1-ISBLANK(LongRange),
CheckIsBlank_ByRow, MMULT(CheckIsBlank_FullRange,SEQUENCE(InputColumnCount,,1,0)),
InputRowCount, MAX(FILTER(SEQUENCE(ROWS(LongRange)),CheckIsBlank_ByRow>0)),
InputRange, INDIRECT(InputSheet&"!A1:"&LastColumnLetters&InputRowCount),
RowSequence, SEQUENCE(InputRowCount*InputColumnCount*2,,0),
RowIndexSkip, MOD(RowSequence,2),
RowSequenceSkip, (RowSequence-RowIndexSkip)/2,
RowIndex, MOD(RowSequenceSkip,InputRowCount)+2,
ColumnIndex, (RowSequenceSkip-MOD(RowSequenceSkip,InputRowCount))/InputRowCount+1,
UnfilteredResult,
IF(RowIndexSkip=0,
INDEX(InputRange,,ColumnIndex),
INDEX(InputRange,RowIndex,ColumnIndex)
),
FILTER(UnfilteredResult,INDEX(CheckIsBlank_FullRange,RowIndex,ColumnIndex))
)
This code is even slower and more confusing, because there's so much to be done, but it will check to see that it has every column. It does it with this:
InputColumnCount, XMATCH(FALSE,ISBLANK(INDIRECT(InputSheet&"!1:1")),0,-1),
This code searches the first row of the target worksheet, looking for the last column. It does this with the -1
command in the XMATCH
, this tells excel to start looking from the right/bottom, and work up/left.
Then I've included some code for addressing the relevant columns:
Column1Counter, MOD(InputColumnCount-1,26)+1,
Column2Counter, (MOD((InputColumnCount-27-MOD(InputColumnCount-27,26))/26,26)+1)*(InputColumnCount>26),
Column3Counter, (((InputColumnCount-Column1Counter)/26)-Column2Counter)/26,
LastColumnLetters,
IF(Column3Counter=0,"",CHAR(Column3Counter+64))&
IF(Column2Counter=0,"",CHAR(Column2Counter+64))&
CHAR(Column1Counter+64),
You could use an INDEX
to generate this, but it will actually slow down quite a bit, because Excel would be essentially loading all 1048576 * ColumnCount
cells into memory (and this code already does this once... no need to do it twice).