1

I have a lot of rows of data in this format (thousands of rows!)

They are generated from a Query

Seen Loc Qty  
robi r 5  
robi w 2  
robi p 1  
spar r 3  
spar w 1  
bb r 10  
bb  p 1  
cauv r 5  
wate r 6  
wate p 1 

And I want to get them into this format (row format)

Seen    Loc(r)  Loc(w)  Loc(p)  
robi    5       2       1  
spar    3       1       0  
bb     10       0       1  
cauv    5       0       0  
wate    6       0       1  

Can anyone tell me how I can achieve that in a reasonably performance friendly manner because of the number of rows (max 25,000)?

Ideally would like the row formatted output to be created within the same process as the Query that creates the columns (i.e. =newrowprocess(existingQuery) .... creates the row formatted output

I have tried using Arrays and IF() statements within Arrays.

Thanks

roger
  • 79
  • 1
  • 6
  • 2
    Does this answer your question? [How to reshape data from long to wide format](https://stackoverflow.com/questions/5890584/how-to-reshape-data-from-long-to-wide-format) – GWD Mar 15 '20 at 12:26
  • @GWD Didn't notice the [r] tag. If OP needs an R solution, I will delete my answer. – DenverCoder1 Mar 15 '20 at 12:31
  • The input columns are the output of a query! Ideally I would like to get the transformation into rows done all in one go, i.e. =new row transformation(existing query that creates the columns) .... creates the row format output? – roger Mar 15 '20 at 12:35
  • 1
    @roger closing remark: 25.000 is not a number of rows where performance is an issue when working with R, maybe if you are using a raspberry pi as your machine, but 25 thsd not even on a pi; if you are thinking about serious number of rows (> hundreds of mio., so that is when performance plays a role) you should push that operation to the database eg like here https://stackoverflow.com/questions/15931607/convert-rows-to-columns-using-pivot-in-sql-server – GWD Mar 15 '20 at 12:46
  • Thanks re performance ... but even a few thousand rows results in my computer taking a long time to crank the data - I'm not a programmer, just a user so I daresay some the data isn't structured perfect! – roger Mar 15 '20 at 13:01

2 Answers2

2

Assuming the list of Seen values and possible letters is static*:

=SUMIFS($C:$C,$A:$A,$E2,$B:$B,"r")

=SUMIFS($C:$C,$A:$A,$E2,$B:$B,"w")

=SUMIFS($C:$C,$A:$A,$E2,$B:$B,"q")

$E2 assumes the value to look up in column E.

A is the column to sum, C is the Seen column, B is the Loc column.

enter image description here

*If not, you can use array formulas to get unique values from a list.

DenverCoder1
  • 2,253
  • 1
  • 10
  • 23
2

Using pivot_wider from the tidyr package:

pivot_wider(df, values_from = Qty, names_from = Loc, values_fill = list(Qty = 0))
#> # A tibble: 4 x 4
#>   Seen      r     w     p
#>   <fct> <int> <int> <int>
#> 1 robi      5     2     1
#> 2 spar      3     1     0
#> 3 bb       10     0     1
#> 4 cauv      5     0     0

Allan Cameron
  • 147,086
  • 7
  • 49
  • 87