0

This is a follow up to this question: Return values in sucessive years

@akrun asked me to post a new question

A sample of my data can be found here: https://dl.dropboxusercontent.com/u/5699030/SuccessiveYears.xlsx

Or here it is in a dataframe

id <- c(1,1,1,2,3,4,4,5,5,5,6,7,7,8,8,8,8,9,9,9,10,10,11,12,12,12,13,13,14,14,14,14,14,15,15,15,15,15,15,15,15,15,15,15,15)
Year <- c(2010,2011,2012,2010,2013,2012,2013,2010,2011,2012,2013,2011,2013,2011,2012,2014,2015,2010,2011,2013,2010,2012,2013,2010,2011,2013,2011,2012,2011,2012,2013,2014,2015,2010,2010,2011,2011,2012,2012,2013,2013,2014,2014,2015,2015)
Class <- c("c","c","c","c","a","c","b","c","c","c","b","c","b","d","f","f","f","c","c","c","c","c","b","c","c","b","c","c","c","c","b","c","b","e","d","e","d","e","d","e","d","e","d","e","d")
ValueA <- c(0.1,0.12,0.15,0.21,0.09,0,0.04,0.05,0.27,0.1,0,0,0,0.22,0.18,0,0.11,0,0,0.06,0,0,0,0.19,0.23,0.18,0,0.08,0,0,0.02,0.08,0.09,0.51,0.99,0.71,0.87,0.97,0.95,1.38,0.84,0.53,1.05,1.04,0.77)
ValueB <- c(0.42,0.58,0.38,0.26,0.28,0.28,0.27,0.22,0.27,0.2,0.11,0,0.05,0.34,0.27,0,0.11,0,0,0.06,0,0,0,0.38,0.31,0.35,0.15,0.17,0.13,0.04,0.07,0.08,0.28,0.92,1.34,0.98,1.18,1.06,1.29,1.74,1.15,0.75,1.49,1.33,1.07)
ValueC <- c(1.3,0.81,1.07,1.16,3.04,0.28,0.18,1.09,1.42,1.19,0.11,0.2,0.45,1.79,1.08,0.59,0.66,0.35,0.3,0.17,0.44,0.57,0.65,2.32,2.45,2.01,3.24,2.45,0.73,0.5,0.4,0.56,0.56,5.96,5.88,5.27,5.01,5.44,6.09,5.5,6.08,3.92,4.98,5.7,5.12)
df <- data.frame(id, Year, Class, ValueA, ValueB, ValueC)
df

For every combination of id and class, where there are values for two successive years I would like the ValueA, ValueB and ValueC returned in a row for each of those two successive years. All of the column headers for the subsequent year values should be prefixed with "Y2".

Each id can have values for more than one class in any one year, hence why we need to look at the combination of id and class.

The output should look as follows:

id <- c(1,1,5,5,8,9,13,14,15,15,15,15,15,15,15,15,15,15)
Year <- c(2010,2011,2010,2011,2014,2010,2011,2011,2010,2010,2011,2011,2012,2012,2013,2013,2014,2014)
Class <- c("c","c","c","c","f","c","c","c","e","d","e","d","e","d","e","d","e","d")
ValueA <- c(0.1,0.12,0.05,0.27,0,0,0,0,0.51,0.99,0.71,0.87,0.97,0.95,1.38,0.84,0.53,1.05)
ValueB <- c(0.42,0.58,0.22,0.27,0,0,0.15,0.13,0.92,1.34,0.98,1.18,1.06,1.29,1.74,1.15,0.75,1.49)
ValueC <- c(1.3,0.81,1.09,1.42,0.59,0.35,3.24,0.73,5.96,5.88,5.27,5.01,5.44,6.09,5.5,6.08,3.92,4.98)
Y2Year <- c(2011,2012,2011,2012,2015,2011,2012,2012,2011,2011,2012,2012,2013,2013,2014,2014,2015,2015)
Y2Class <- c("c","c","c","c","f","c","c","c","e","d","e","d","e","d","e","d","e","d")
Y2ValueA <- c(0.12,0.15,0.27,0.1,0.11,0,0.08,0,0.71,0.87,0.97,0.95,1.38,0.84,0.53,1.05,1.04,0.77)
Y2ValueB <- c(0.58,0.38,0.27,0.2,0.11,0,0.17,0.04,0.98,1.18,1.06,1.29,1.74,1.15,0.75,1.49,1.33,1.07)
Y2ValueC <- c(0.81,1.07,1.42,1.19,0.66,0.3,2.45,0.5,5.27,5.01,5.44,6.09,5.5,6.08,3.92,4.98,5.7,5.12)
df1 <- data.frame(id, Year, Class, ValueA, ValueB, ValueC, Y2Year, Y2Class, Y2ValueA, Y2ValueB, Y2ValueC)
df1
Community
  • 1
  • 1
cookie
  • 15
  • 3
  • Rather than link to a file and try to explain what you want (certainly do that as well), it's best if we can see a small example dataset here, along with the output you're after. Your description seems OK, but I'm personally not getting a clear picture of what you're after. Seeing a small sample of the output you'd _like_ to see will make that much clearer. – rosscova Jul 31 '16 at 10:17
  • The output is in the second file that I linked; I thought that would have been an OK way to present it. is that a no no? https://dl.dropboxusercontent.com/u/5699030/Output.xlsx – cookie Jul 31 '16 at 10:49
  • 1
    @rosscova Original post edited to include data and outputs as dataframes. Hope that is better etiquette. – cookie Jul 31 '16 at 14:50
  • An easier way to include the data would be to read your Excel file into R, e.g. `df <- read_excel("~/Downloads/SuccessiveYears.xlsx")` from the `readxl` package, "print" the data using `dput(df)`, and then copy-paste the output `structure(...)` into your question. See the [FAQ](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) here. – Weihuang Wong Jul 31 '16 at 15:22
  • @WeihuangWong Thanks for explaining how to include the data here. It took me quite a while to create that dataframe manually, so that shortcut is most appreciated – cookie Jul 31 '16 at 15:52

1 Answers1

0

First I use dlply to split the dataset into groups identified by id and Class:

library(plyr)
grouped <- dlply(df, .(id, Class), function(x) return(x[order(x$Year), ]))

Then, apply the consecutive years logic, using merge to join data from consecutive years, and returning the result as a list:

out <- lapply(grouped, function(x) {
    if (nrow(x)>1) {
        out <- lapply(1:(nrow(x)-1), function(i) {
            if ((x$Year[i]+1)==(x$Year[i+1])) {
                merge(x[i,], x[i+1,], by=c("id", "Class"))
            } else {
                NULL
            }
        })      
    }
})

Finally, bind the results into a dataframe:

final <- do.call(rbind, lapply(out, function(x) {
    if(length(x)>=1) {
        do.call(rbind, x)   
    } else {
        x
    }
}))

And finish up by renaming the columns.

cols <- c("Year", paste0("Value", LETTERS[1:3]))
names(final) <- c("id", "Class", cols, paste0("Y2", cols))
rownames(final) <- 1:nrow(final)

Result:

> final
   id Class Year ValueA ValueB ValueC Y2Year Y2ValueA Y2ValueB Y2ValueC
1   1     c 2010   0.10   0.42   1.30   2011     0.12     0.58     0.81
2   1     c 2011   0.12   0.58   0.81   2012     0.15     0.38     1.07
3   5     c 2010   0.05   0.22   1.09   2011     0.27     0.27     1.42
4   5     c 2011   0.27   0.27   1.42   2012     0.10     0.20     1.19
5   8     f 2014   0.00   0.00   0.59   2015     0.11     0.11     0.66
6   9     c 2010   0.00   0.00   0.35   2011     0.00     0.00     0.30
7  12     c 2010   0.19   0.38   2.32   2011     0.23     0.31     2.45
8  13     c 2011   0.00   0.15   3.24   2012     0.08     0.17     2.45
9  14     c 2011   0.00   0.13   0.73   2012     0.00     0.04     0.50
10 15     d 2010   0.99   1.34   5.88   2011     0.87     1.18     5.01
11 15     d 2011   0.87   1.18   5.01   2012     0.95     1.29     6.09
12 15     d 2012   0.95   1.29   6.09   2013     0.84     1.15     6.08
13 15     d 2013   0.84   1.15   6.08   2014     1.05     1.49     4.98
14 15     d 2014   1.05   1.49   4.98   2015     0.77     1.07     5.12
15 15     e 2010   0.51   0.92   5.96   2011     0.71     0.98     5.27
16 15     e 2011   0.71   0.98   5.27   2012     0.97     1.06     5.44
17 15     e 2012   0.97   1.06   5.44   2013     1.38     1.74     5.50
18 15     e 2013   1.38   1.74   5.50   2014     0.53     0.75     3.92
19 15     e 2014   0.53   0.75   3.92   2015     1.04     1.33     5.70

I ended up with one more row than your output. Perhaps you missed a row?

Weihuang Wong
  • 12,868
  • 2
  • 27
  • 48
  • Thanks; that gave me exactly what I wanted. And you even found the mistake in my manually generated output. I missed id = 12 from my output, but your code successfully included it :) – cookie Jul 31 '16 at 17:08