0

I can't seem to reshape one column of my data frame into the correct shape while retaining unique identifiers for each row. I have the following data

   id     x        y   indicator
1   1 249.6  1.124985        1 
2   1 250.9  1.124756        1 
3   1 252.2  1.124125        1 
4   1 253.5  1.124598        1 
5   1 254.8  1.127745        1 
6   1 256.1  1.129102        1 
7   2 249.6  2.167348        0   
8   2 250.9  2.165804        0   
9   2 252.2  2.164578        0  
10  2 253.5  2.163828        0  
11  2 254.8  2.164260        0   
12  2 256.1  2.166293        0 
13  3 249.6  0.04647765      0
14  3 250.9  0.04932262      0
15  3 252.2  0.05245448      0
15  3 253.5  0.05692405      0
17  3 254.8  0.06184551      0
18  3 256.1  0.06751989      0

I would like to reshape the y vector into a matrix where each row corresponds to a single y vector and there are additional columns for id and indicator, while the variable columns are labeled by the x values. Something like this:

id indicator  249.6      250.9      252.2      ...
1  1          1.124985   1.124756   1.124125   ...
2  0          2.167348   2.165804   2.164578   ...
3  0          0.04647765 0.04932262 0.05245448 ...

I've tried using the reshape function like this:

reshape(df[c('id','x','y')], direction = "wide", idvar = "id", timevar = "x")

In this case, I was just ignoring the indicator variable to see if it would work, but I got a data frame with only two columns back, the first column was ID, and the second was y.c(249.6, 250.9, 252.2, 253.5, 254.8, 256.1, 257.4, 258.7, 260, 261.3, 262.6, 263.9, 265.2, 266.5, 267.8, 269.1, 270.4, 271.7, 273, 274.3, 275.6, 276.9, 278.2, 279.5, 280.8, 282.1, 283.4, 284.7, 286, 287.3, 288.6, 289.9, 291.2, 292.5, 293.8, 295.1,[etc].

I also tried using the xtabs function: a = xtabs(formula = y ~ id + indicator+ x, data=df), but this just returned a table that looks very similar to the one I have as input.

CopyOfA
  • 767
  • 5
  • 19

3 Answers3

1

Isto é uma questão de reformatar de formato longo para largo. (Veja no SO em Inglês aqui).

library(dplyr)
library(tidyr)

df1 %>%
  pivot_wider(
    id_cols = c('id', 'indicator'),
    names_from = 'x',
    values_from = 'y'
  )
## A tibble: 3 x 8
#     id indicator `249.6` `250.9` `252.2` `253.5` `254.8` `256.1`
#  <int>     <int>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
#1     1         1  1.12    1.12    1.12    1.12    1.13    1.13  
#2     2         0  2.17    2.17    2.16    2.16    2.16    2.17  
#3     3         0  0.0465  0.0493  0.0525  0.0569  0.0618  0.0675
 

Dados

df1 <- read.table(text = "
   id     x        y   indicator
1   1 249.6  1.124985        1 
2   1 250.9  1.124756        1 
3   1 252.2  1.124125        1 
4   1 253.5  1.124598        1 
5   1 254.8  1.127745        1 
6   1 256.1  1.129102        1 
7   2 249.6  2.167348        0   
8   2 250.9  2.165804        0   
9   2 252.2  2.164578        0  
10  2 253.5  2.163828        0  
11  2 254.8  2.164260        0   
12  2 256.1  2.166293        0 
13  3 249.6  0.04647765      0
14  3 250.9  0.04932262      0
15  3 252.2  0.05245448      0
16  3 253.5  0.05692405      0
17  3 254.8  0.06184551      0
18  3 256.1  0.06751989      0
", header = TRUE)
Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
1

Regarding the code in the question, indicator should be part of the idvar. Also if there are no other columns in df other than the 4 listed then df[...] could be shortened to df.

reshape(df[c('id','x','y','indicator')], direction = "wide", 
  idvar = c("id", "indicator"), timevar = "x")

giving:

   id indicator    y.249.6    y.250.9    y.252.2    y.253.5    y.254.8    y.256.1
1   1         1 1.12498500 1.12475600 1.12412500 1.12459800 1.12774500 1.12910200
7   2         0 2.16734800 2.16580400 2.16457800 2.16382800 2.16426000 2.16629300
13  3         0 0.04647765 0.04932262 0.05245448 0.05692405 0.06184551 0.06751989

Note

The input in reproducible form:

df <- structure(list(id = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 
2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L), x = c(249.6, 250.9, 252.2, 253.5, 
254.8, 256.1, 249.6, 250.9, 252.2, 253.5, 254.8, 256.1, 249.6, 
250.9, 252.2, 253.5, 254.8, 256.1), y = c(1.124985, 1.124756, 
1.124125, 1.124598, 1.127745, 1.129102, 2.167348, 2.165804, 2.164578, 
2.163828, 2.16426, 2.166293, 0.04647765, 0.04932262, 0.05245448, 
0.05692405, 0.06184551, 0.06751989), indicator = c(1L, 1L, 1L, 
1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L)), 
class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", 
"14", "15", "16", "17", "18"))
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
0

So I quickly figured out my answer here using tidyr:

z = pivot_wider(df, id_cols = c("id","indicator"), names_from = "x", values_from = "y")

Rui Barradas's answer is the same approach, and this works perfectly.

CopyOfA
  • 767
  • 5
  • 19