How to summarise a data frame based on values occurring at unique data points in a particular column?

Question

I've got a data frame df given by:

r  t s      v
1  1 a   4.50
2  1 b   3.00
3  2 c   3.22
4  3 d   2.00
5  3 a   5.00
6  1 c   1.00
7  1 f  14.00
8  2 b 144.00
9  3 c   2.00
10 4 a  22.00
11 2 a   2.20
12 3 e 232.00
13 4 g  45.00
14 3 g   4.30
15 3 b   3.20
16 4 b   2.00
17 4 c   2.60

and I want to convert this into another data frame df1 as

r t    a     b    c  d   e    f  g
1 1  4.5   3.0 1.00 NA  NA 14.0 NA
2 2  2.2 144.0 3.22 NA  NA   NA NA
3 3  5.0   3.2 2.00  2 232 NA 4.3
4 4 22.0   2.0 2.60 NA  NA NA 45.0

where the colnames in df1 are the unique values from s column in df and they are grouped by their occurrence in t column in df.

There won’t be any duplicates of ‘s’ in each ‘t’ so it can be assumed that each ‘s’ only appears once for every ‘t’ value.

Is there an easy way (using dplyr or similar) to manipulate the data in df to get df1?

When you have multiple rows with the same values of `t` & `s`, what do you want to do with the `v`'s? Would you want to average them, sum them, take the median, etc? — gung - Reinstate Monica, Feb 11 '19 at 20:05
Try `library(reshape2); dcast(dat[-1], t ~ s)` then add column `r` if needed. Your expected output for cols `f` and `g` seems not correct. — markus, Feb 11 '19 at 20:08
Possible duplicate of [How to reshape data from long to wide format?](https://stackoverflow.com/questions/5890584/how-to-reshape-data-from-long-to-wide-format) — markus, Feb 11 '19 at 20:08
I also used `reshape2`: `df1 <- dcast(df, t ~ s, value.var = "v")` . Example result is off. — nycrefugee, Feb 11 '19 at 20:16

score 0 · Accepted Answer · answered Feb 11 '19 at 20:18

I'm inferring that you might be having problems using answers from the duplicate target because you have included the row number as a column. We can spread on the table without the r column.Note that there are two values in rows 3 and 4 of the output which you appear to have misclassified in the example output; they should be under g and not f.

library(tidyverse)
tbl <- read_table2(
"t s      v
1 a   4.50
1 b   3.00
2 c   3.22
3 d   2.00
3 a   5.00
1 c   1.00
1 f  14.00
2 b 144.00
3 c   2.00
4 a  22.00
2 a   2.20
3 e 232.00
4 g  45.00
3 g   4.30
3 b   3.20
4 b   2.00
4 c   2.60"
)
tbl %>%
  spread(s, v)
#> # A tibble: 4 x 8
#>       t     a     b     c     d     e     f     g
#>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1     1   4.5   3    1       NA    NA    14  NA  
#> 2     2   2.2 144    3.22    NA    NA    NA  NA  
#> 3     3   5     3.2  2        2   232    NA   4.3
#> 4     4  22     2    2.6     NA    NA    NA  45

^{Created on 2019-02-11 by the reprex package (v0.2.1)}

Hi, we can assume that there won’t be any duplicates for ‘s’ in each ‘t’ so only one value of ‘s’ is present per ‘t’ value. — user1809989, Feb 12 '19 at 09:18

How to summarise a data frame based on values occurring at unique data points in a particular column?

1 Answers1