2

I need to reshape data.frame in R in one step. In short, change of values of objects (x1 to x6) is visible row by row (from 1990 to 1995):

> tab1[1:10, ] # raw data see plot for tab1
   id value year
1  x1     7 1990
2  x1    10 1991
3  x1    11 1992
4  x1     7 1993
5  x1     3 1994
6  x1     1 1995
7  x2     6 1990
8  x2     7 1991
9  x2     9 1992
10 x2     5 1993

I am able to do reshaping step by step, does anybody know how do it in one step?

Original data Table 1 - see that minimal value from all timeseries is "0"

Step1: Table 2 - rescale each timeseries that each would have minimal value equal "0". All times fall down on x-axes.

Step2: Table 3 - apply diff() function on each timeline.

Step3: Table 4 - apply sort() function on each timeseries.

I hope the pictures are clear enough for understanding each step.

So final table looks like this:

> tab4[1:10, ]
   id value time
1  x1    -4    1
2  x1    -4    2
3  x1    -2    3
4  x1     1    4
5  x1     3    5
6  x2    -4    1
7  x2    -3    2
8  x2     1    3
9  x2     1    4
10 x2     2    5

enter image description here

# Source data:
tab1 <- data.frame(id = rep(c("x1","x2","x3","x4","x5","x6"), each = 6),
                   value = c(7,10,11,7,3,1,6,7,9,5,2,3,11,9,7,9,1,
                             0,1,2,2,4,7,4,2,3,1,6,4,2,3,5,4,3,5,6),
                   year = rep(c(1990:1995), times = 6))

tab2 <- data.frame(id = rep(c("x1","x2","x3","x4","x5","x6"), each = 6),
                   value = c(6,9,10,6,2,0,4,5,7,3,0,1,11,9,7,9,1,0,
                             0,1,1,3,6,3,1,2,0,5,3,1,0,2,1,0,2,3),
                   year = rep(c(1990:1995), times = 6))

tab3 <- data.frame(id = rep(c("x1","x2","x3","x4","x5","x6"), each = 5),
                   value = c(3,1,-4,-4,-2,1,2,-4,-3,1,-2,-2,2,-8,-1,
                             1,0,2,3,-3,1,-2,5,-2,-2,2,-1,-1,2,1),
                   time = rep(c(1:5), times = 6))

tab4 <- data.frame(id = rep(c("x1","x2","x3","x4","x5","x6"), each = 5),
                   value = c(-4,-4,-2,1,3,-4,-3,1,1,2,-8,-2,-2,-1,2,
                             -3,0,1,2,3,-2,-2,-2,1,5,-1,-1,1,2,2),
                   time = rep(c(1:5), times = 6))
Ladislav Naďo
  • 822
  • 12
  • 27
  • What do you mean by "I am able to do reshaping step by step, does anybody know how do it in one step". Please show what you have tried and where you got stuck. Sharing your attempts saves us from reiterating obvious answers, and it helps you get a more specific and relevant answer. – Henrik Jul 15 '14 at 15:16
  • Hi @Henrik. I've created each timeseries (vector) separately and applied all steps on each vectors. I never do the procedure to a data.frame object (and I do not know how). – Ladislav Naďo Jul 15 '14 at 15:19

2 Answers2

3

Using data.table, this is simply:

require(data.table) ## 1.9.2
ans <- setDT(tab1)[, list(value=diff(value)), by=id]  ## aggregation
setkey(ans, id,value)[, time := seq_len(.N), by=id] ## order + add 'time' column

Note that your 'step 1' is unnecessary as your second step is calculating difference and it wouldn't have any effect (and is therefore skipped here).

Arun
  • 116,683
  • 26
  • 284
  • 387
2

It sounds like you want to apply a set of functions to each group of a grouping variable. There are many ways to do this in R (from base R by and tapply to add-on packages like plyr, data.table, and dplyr). I've been learning how to use package dplyr, and came up with the following solution.

require(dplyr)

tab4 = tab1 %>%
    group_by(id) %>% # group by id
    mutate(value = value - min(value), value = value - lag(value)) %>% # group min to 0, difference lag 1
    na.omit %>% # remove NA caused by lag 1 differencing
    arrange(id, value) %>% # order by value within each id
    mutate(time = 1:length(value)) %>% # Make a time variable from 1 to 5 based on current order
    select(-year) # remove year column to match final OP output
aosmith
  • 34,856
  • 9
  • 84
  • 118