2

I have a large dataset (>79000 rows). My dataset looks like this:

Col1 Sampl1 Sampl2 Sampl3
sp1 0.1 0.5 1.2
sp2 1 2.1 4
sp3 2 3 0.75
sp1 1 4 1

I want to combine the duplicate columns in the species column while also summing the values in the samples' columns so my dataset will look like this:

Col1 Sampl1 Sampl2 Sampl3
sp1 1.1 4.5 2.2
sp2 1 2.1 4
sp3 2 3 0.75

I have tried using the consolidate function in excel but it did not produce anything after loading (I would select my references and the "sum" function, I waited, and after it loaded nothing appeared in the new sheet).

Scott Craner
  • 148,073
  • 10
  • 49
  • 81
Alex Gomez
  • 21
  • 2

3 Answers3

2

As commented above, this can be performed more elegantly using Power Query without any issue. May be there can be more eloquent solutions to this, but i will personally use Power Query here when dealing with larger datasets where i need to Group by duplicates and sum the respective values. Note that this is a one time approach, next time when you add new data to your source data, you just need to refresh the table imported using Power Query.


So that said, you can follow the steps to perform the task using Power Query

• Select some cell in your Data Table,

Data Tab => Get&Transform => From Table/Range,

• When the PQ Editor opens: Home => Advanced Editor,

• Make note of all the Tables Names,

• Paste the M Code below in place of what you see.


Approach One:

let
    Source = Excel.CurrentWorkbook(){[Name="Table2"]}[Content],
    #"Grouped Rows" = Table.Group(Source, {"Col1"}, {{"Sample1", each List.Sum([Sampl1]), type number}, {"Sample2", each List.Sum([Sampl2]), type number}, {"Sample3", each List.Sum([Sampl3]), type number}})
in
    #"Grouped Rows"

Or Approach Two:

let
    Source = Excel.CurrentWorkbook(){[Name="Table2"]}[Content],
    #"Unpivoted Other Columns" = Table.UnpivotOtherColumns(Source, {"Col1"}, "Attribute", "Value"),
    #"Pivoted Column" = Table.Pivot(#"Unpivoted Other Columns", List.Distinct(#"Unpivoted Other Columns"[Attribute]), "Attribute", "Value", List.Sum)
in
    #"Pivoted Column"

You can use either of the one, but using the Second One will be more succinct.

• Change the Table name if required before importing it back into Excel.

• From Home Tab => Close & Load => Close & Load To

• When importing, you can either select Existing Sheet with the cell reference you want to place the table or you can simply click on NewSheet


enter image description here


Approach One --> Screenshots

enter image description here


Approach Two --> Screenshots

enter image description here


An Excel Formula alternative approach using New MS365 functions. -->

REDUCE UNIQUE BYCOL LAMBDA VSTACK HSTACK

enter image description here


• Formula used in cell F1

=REDUCE(A1:D1,UNIQUE(A2:A5),
LAMBDA(x,y,VSTACK(x,HSTACK(y,
BYCOL(B2:D5,LAMBDA(m,SUM((A2:A5=y)*m)))))))

The formula approach also works fine for me, Tested with 86017 rows of data and here is a sample workbook.

enter image description here


Mayukh Bhattacharya
  • 12,541
  • 5
  • 21
  • 32
1

Simpler formulas are UNIQUE and SUMIFS:

F5 =UNIQUE(A5:A8)

G5 =SUMIF($A$5:$A$8,$F5,B$5:B$8) - drag or copy this as far as you need.

Or create a pivot table with Col1 in the Rows section, and Sampl1, Sampl2, etc. in the Values section.

enter image description here

kevin
  • 1,357
  • 1
  • 4
  • 10
1

Sum Up Unique Data

enter image description here

=LET(data,A2:D7,
    du,INDEX(data,,1),u,UNIQUE(du),dv,DROP(data,,1),
    dr,DROP(REDUCE("",SEQUENCE(ROWS(u)),LAMBDA(rr,r,
        LET(f,FILTER(dv,du=INDEX(u,r)),
            VSTACK(rr,BYCOL(f,LAMBDA(c,SUM(c))))))),1),
HSTACK(u,dr))
VBasic2008
  • 44,888
  • 5
  • 17
  • 28