2

Say I have data that looks like

date, user, items_bought, event_number
2013-01-01, x, 2, 1
2013-01-02, x, 1, 2
2013-01-03, x, 0, 3
2013-01-04, x, 0, 4
2013-01-04, x, 1, 5
2013-01-04, x, 2, 6
2013-01-05, x, 3, 7
2013-01-06, x, 1, 8
2013-01-01, y, 1, 1
2013-01-02, y, 1, 2
2013-01-03, y, 0, 3
2013-01-04, y, 5, 4
2013-01-05, y, 6, 5
2013-01-06, y, 1, 6

to get the cumulative sum per user per data point I was doing

data.frame(cum_items_bought=unlist(tapply(as.numeric(data$items_bought), data$user, FUN = cumsum)))

output from this looks like

date, user, items_bought
2013-01-01, x, 2
2013-01-02, x, 3
2013-01-03, x, 3
2013-01-04, x, 3
2013-01-04, x, 4
2013-01-04, x, 6
2013-01-05, x, 9
2013-01-06, x, 10
2013-01-01, y, 1
2013-01-02, y, 2
2013-01-03, y, 2
2013-01-04, y, 7
2013-01-05, y, 13
2013-01-06, y, 14

However I want to restrict my sum to only add up those that happened within 3 days of each row (relative to the user). i.e. the output needs to look like this:

date, user, cum_items_bought_3_days
2013-01-01, x, 2
2013-01-02, x, 3
2013-01-03, x, 3
2013-01-04, x, 1
2013-01-04, x, 2
2013-01-04, x, 4
2013-01-05, x, 6
2013-01-06, x, 7
2013-01-01, y, 1
2013-01-02, y, 2
2013-01-03, y, 2
2013-01-04, y, 6
2013-01-05, y, 11
2013-01-06, y, 12
Jk1
  • 11,233
  • 9
  • 54
  • 64
shecode
  • 1,716
  • 6
  • 32
  • 50
  • The formatting was better the first time. If you have other changes, go ahead, but leave the code/data as is. – joran Jun 03 '14 at 16:28
  • I need to mention. There can be more than one date for each user (which is ordered by an epoch), so I'd like to sum up everything within 3 days before (including the rows on the same day, but before the row of interest) – shecode Jun 06 '14 at 14:40
  • @user31260, please provide feedback on the answers below, i.e., if they satisfy your needs computations time wise or on any other aspect. Thanks – David Arenburg Jun 09 '14 at 18:15
  • I've had to add things to the data set to demonstrate what i want to happen when there are more than one row per date for a user. My apologies for not having a clearer example to begin with, I believe people are thinking I want to aggregate at the date level first, but this is not the case. I want everything to be relative to within X dates, but also have a condition to say the event number for that user or is before or equal the current event number. Please see the example above for what happens when there is more than one row on the 4th of January for user x – shecode Jun 10 '14 at 08:42

7 Answers7

6

Here's a dplyr solution which will produce the desired result (14 rows) as specified in the question. Note that it takes care of duplicate date entries, for example, 2013-01-04 for user x.

# define a custom function to be used in the dplyr chain
myfunc <- function(x){
  with(x, sapply(event_number, function(y) 
    sum(items_bought[event_number <= event_number[y] & date[y] - date <= 2])))
}

require(dplyr)                 #install and load into your library

df %>%
  mutate(date = as.Date(as.character(date))) %>%
  group_by(user) %>%
  do(data.frame(., cum_items_bought_3_days = myfunc(.))) %>%
  select(-c(items_bought, event_number))

#         date user cum_items_bought_3_days
#1  2013-01-01    x                       2
#2  2013-01-02    x                       3
#3  2013-01-03    x                       3
#4  2013-01-04    x                       1
#5  2013-01-04    x                       2
#6  2013-01-04    x                       4
#7  2013-01-05    x                       6
#8  2013-01-06    x                       7
#9  2013-01-01    y                       1
#10 2013-01-02    y                       2
#11 2013-01-03    y                       2
#12 2013-01-04    y                       6
#13 2013-01-05    y                      11
#14 2013-01-06    y                      12

In my answer I use a custom function myfunc inside a dplyr chain. This is done using the do operator from dplyr. The custom function is passed the subsetted df by user groups. It then uses sapply to pass each event_number and calculate the sums of items_bought. The last line of the dplyr chain deselects the undesired columns.

Let me know if you'd like a more detailed explanation.

Edit after comment by OP:

If you need more flexibility to also conditionally sum up other columns, you can adjust the code as follows. I assume here, that the other columns should be summed up the same way as items_bought. If that is not correct, please specify how you want to sum up the other columns.

I first create two additional columns with random numbers in the data (I'll post a dput of the data at the bottom of my answer):

set.seed(99)   # for reproducibility only

df$newCol1 <- sample(0:10, 14, replace=T)
df$newCol2 <- runif(14)

df
#         date user items_bought event_number newCol1     newCol2
#1  2013-01-01    x            2            1       6 0.687800094
#2  2013-01-02    x            1            2       1 0.640190769
#3  2013-01-03    x            0            3       7 0.357885360
#4  2013-01-04    x            0            4      10 0.102584999
#5  2013-01-04    x            1            5       5 0.097790922
#6  2013-01-04    x            2            6      10 0.182886256
#7  2013-01-05    x            3            7       7 0.227903474
#8  2013-01-06    x            1            8       3 0.080524150
#9  2013-01-01    y            1            1       3 0.821618422
#10 2013-01-02    y            1            2       1 0.591113977
#11 2013-01-03    y            0            3       6 0.773389019
#12 2013-01-04    y            5            4       5 0.350085977
#13 2013-01-05    y            6            5       2 0.006061323
#14 2013-01-06    y            1            6       7 0.814506223

Next, you can modify myfunc to take 2 arguments, instead of 1. The first argument will remain the subsetted data.frame as before (represented by . inside the dplyr chain and x in the function definition of myfunc), while the second argument to myfunc will specify the column to sum up (colname).

myfunc <- function(x, colname){
  with(x, sapply(event_number, function(y) 
    sum(x[event_number <= event_number[y] & date[y] - date <= 2, colname])))
}

Then, you can use myfunc several times if you want to conditionally sum up several columns:

df %>%
  mutate(date = as.Date(as.character(date))) %>%
  group_by(user) %>%
  do(data.frame(., cum_items_bought_3_days = myfunc(., "items_bought"),
                   newCol1Sums = myfunc(., "newCol1"),            
                   newCol2Sums = myfunc(., "newCol2"))) %>%
select(-c(items_bought, event_number, newCol1, newCol2))

#         date user cum_items_bought_3_days newCol1Sums newCol2Sums
#1  2013-01-01    x                       2           6   0.6878001
#2  2013-01-02    x                       3           7   1.3279909
#3  2013-01-03    x                       3          14   1.6858762
#4  2013-01-04    x                       1          18   1.1006611
#5  2013-01-04    x                       2          23   1.1984520
#6  2013-01-04    x                       4          33   1.3813383
#7  2013-01-05    x                       6          39   0.9690510
#8  2013-01-06    x                       7          35   0.6916898
#9  2013-01-01    y                       1           3   0.8216184
#10 2013-01-02    y                       2           4   1.4127324
#11 2013-01-03    y                       2          10   2.1861214
#12 2013-01-04    y                       6          12   1.7145890
#13 2013-01-05    y                      11          13   1.1295363
#14 2013-01-06    y                      12          14   1.1706535

Now you created conditional sums of the columns items_bought, newCol1 and newCol2. You can also leave out any of the sums in the dplyr chain or add more columns to sum up.

Edit #2 after comment by OP:

To calculate the cumulative sum of distinct (unique) items bought per user, you could define a second custom function myfunc2 and use it inside the dplyr chain. This function is also flexible as myfunc so that you can define the columns to which you want to apply the function.

The code would then be:

myfunc <- function(x, colname){
  with(x, sapply(event_number, function(y) 
    sum(x[event_number <= event_number[y] & date[y] - date <= 2, colname])))
}

myfunc2 <- function(x, colname){
  cumsum(sapply(seq_along(x[[colname]]), function(y) 
    ifelse(!y == 1 & x[y, colname] %in% x[1:(y-1), colname], 0, 1)))
}

require(dplyr)                 #install and load into your library

dd %>%
  mutate(date = as.Date(as.character(date))) %>%
  group_by(user) %>%
  do(data.frame(., cum_items_bought_3_days = myfunc(., "items_bought"),
                   newCol1Sums = myfunc(., "newCol1"),
                   newCol2Sums = myfunc(., "newCol2"),
                   distinct_items_bought = myfunc2(., "items_bought"))) %>%   
  select(-c(items_bought, event_number, newCol1, newCol2))

Here is the data I used:

dput(df)
structure(list(date = structure(c(1L, 2L, 3L, 4L, 4L, 4L, 5L, 
6L, 1L, 2L, 3L, 4L, 5L, 6L), .Label = c("2013-01-01", "2013-01-02", 
"2013-01-03", "2013-01-04", "2013-01-05", "2013-01-06"), class = "factor"), 
user = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 
2L, 2L, 2L, 2L), .Label = c(" x", " y"), class = "factor"), 
items_bought = c(2L, 1L, 0L, 0L, 1L, 2L, 3L, 1L, 1L, 1L, 
0L, 5L, 6L, 1L), event_number = c(1L, 2L, 3L, 4L, 5L, 6L, 
7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L), newCol1 = c(6L, 1L, 7L, 
10L, 5L, 10L, 7L, 3L, 3L, 1L, 6L, 5L, 2L, 7L), newCol2 = c(0.687800094485283, 
0.640190769452602, 0.357885359786451, 0.10258499882184, 0.0977909218054265, 
0.182886255905032, 0.227903473889455, 0.0805241498164833, 
0.821618422167376, 0.591113976901397, 0.773389018839225, 
0.350085976999253, 0.00606132275424898, 0.814506222726777
)), .Names = c("date", "user", "items_bought", "event_number", 
"newCol1", "newCol2"), row.names = c(NA, -14L), class = "data.frame")
talat
  • 68,970
  • 21
  • 126
  • 157
  • This is very good thanks. I would look to make the function be more flexible as in my data set I have multiple columns similar to items_bought that I want to to do sums/ counts etc on, is there a way to do this? – shecode Jun 11 '14 at 09:45
  • @user31260 do you want to calculate the sums for the other columns exactly the same way as for `cum_items_bought_3_days`? If not, could you describe in more detail, how the other functions should be summed up? – talat Jun 11 '14 at 10:21
  • for some columns, yes, but then some other ones I may for instance what to count up the number of distinct types of items bought etc... – shecode Jun 11 '14 at 10:30
  • I edited my answer to make `myfunc` more flexible. You can now specify any columns you want to sum up based on the same conditions as for `items_bought`. If you want to count up the number of distinct types of items bought, do you mean a normal cumulative sum or also on the condition that the date is within 3 days of the current row? – talat Jun 11 '14 at 10:46
  • having the flexibility for doing both those examples within the same function would be ideal. thanks so much for your answer :) – shecode Jun 11 '14 at 10:47
  • And what do you consider a distinct type of items bought? Do you mean the unique numbers in column `items_bought`? They seem to represent the number of items, not what kind of (distinct) items? – talat Jun 11 '14 at 10:54
  • 1
    @user31260 could you edit your question to include an example of how you would want the output to look like after counting distinct types of items bought? It is not clear to me how you want to that given the sample data. – talat Jun 11 '14 at 11:12
  • I know what you mean, but its more a general case. So counting the distinct unique numbers in the column items_bought would suffice. I can manage to convert the logic from there – shecode Jun 11 '14 at 13:28
3

I'd like to propose an additional data.table approach combined with zoo package rollapplyr function

First, we will aggregate items_bought column per user per unique date (as you pointed out that there could be more than one unique date per user)

library(data.table)
data <- setDT(data)[, lapply(.SD, sum), by = c("user", "date"), .SDcols = "items_bought"]

Next, we will compute rollapplyr combined with sum and partial = TRUE in order to cover up for margins (thanks for the advice @G. Grothendieck) in 3 days intervals

library(zoo)
data[, cum_items_bought_3_days := lapply(.SD, rollapplyr, 3, sum, partial = TRUE), .SDcols = "items_bought", by = user]

#     user       date items_bought cum_items_bought_3_days
#  1:    x 2013-01-01            2                       2
#  2:    x 2013-01-02            1                       3
#  3:    x 2013-01-03            0                       3
#  4:    x 2013-01-04            0                       1
#  5:    x 2013-01-05            3                       3
#  6:    x 2013-01-06            1                       4
#  7:    y 2013-01-01            1                       1
#  8:    y 2013-01-02            1                       2
#  9:    y 2013-01-03            0                       2
# 10:    y 2013-01-04            5                       6
# 11:    y 2013-01-05            6                      11
# 12:    y 2013-01-06            1                      12

This is the data set I've used

data <- structure(list(date = structure(c(15706, 15707, 15708, 15709, 15710, 15711, 15706, 15707, 15708, 15709, 15710, 15711), class = "Date"), user = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c(" x", " y"), class = "factor"), items_bought = c(2L, 1L, 0L, 0L, 3L, 1L, 1L, 1L, 0L, 5L, 6L, 1L)), .Names = c("date", "user", "items_bought"), row.names = c(NA, -12L), class = "data.frame")
Community
  • 1
  • 1
David Arenburg
  • 91,361
  • 17
  • 137
  • 196
  • 1
    Note that `rollapply` supports `partial=TRUE` and there exists `rollapplyr` so we can write the `rollsum` line as `data[, cum_items_bought_3_days := lapply(.SD, rollapplyr, 3, sum, partial = TRUE), .SDcols = "items_bought", by = user]` in which case we are done at that point. – G. Grothendieck Jun 09 '14 at 14:52
  • Thanks, @G.Grothendieck, I've edited the answer. I was thinking to avoid rollaply because the `zoo` package documentations states that `rollsum` is more optimized for speed, but I apperantly missed the `partial = T` option – David Arenburg Jun 09 '14 at 18:19
  • @beginneR, that's because he wanted to remove duplicated dates (which he had in the original data), see the beginning of my explanation – David Arenburg Jun 10 '14 at 12:17
  • Okay, I didnt get that from reading the question. I'll delete my comment then. – talat Jun 10 '14 at 12:25
2

Here is a fairly simple method:

# replicate your data, shifting the days ahead by your required window,
# and rbind into a single data frame
d <- do.call(rbind,lapply(0:2, function(x) transform(data,date=date+x)))

# use aggregate to add it together, subsetting out "future" days
aggregate(items_bought~date+user,subset(d,date<=max(data$date)),sum)
         date user items_bought
1  2013-01-01    x            2
2  2013-01-02    x            3
3  2013-01-03    x            3
4  2013-01-04    x            1
5  2013-01-05    x            3
6  2013-01-06    x            4
7  2013-01-01    y            1
8  2013-01-02    y            2
9  2013-01-03    y            2
10 2013-01-04    y            6
11 2013-01-05    y           11
12 2013-01-06    y           12
James
  • 65,548
  • 14
  • 155
  • 193
  • Hi. Thanks for your response. I should have been more detailed with my question. I've actually got multiple rows for the same dates as mine is at the "seconds" level. I wanted to do a rolling sum/avg etc but within x days. So I don't think shifting is going to work. Its a good solution though if my data was fully aggregated. Thanks – shecode Jun 06 '14 at 13:39
  • @user31260 Can you not just convert your detailed time to a `Date` class variable, or are the seconds important? The `aggregate` step will add up all rows, so multiple rows aren't an issue. Though depending on your data size, it might be better to aggregate prior to replicating. – James Jun 06 '14 at 13:42
  • thanks, but this doesn't work for me as I don't actually want to pre-aggregate things, I want the sum to consider the current day before the current row etc – shecode Jun 10 '14 at 09:12
1

The following looks valid:

unlist(lapply(split(data, data$user), 
              function(x) {
                 ave(x$items_bought, 
                 cumsum(c(0, diff(x$date)) >= 3), FUN = cumsum) 
              }))   
#x1  x2  x3  x4  y1  y2  y3  y4 
# 2   3   3   4   1   6   6   7

Where data:

data = structure(list(date = structure(c(15706, 15707, 15710, 15711, 
15706, 15707, 15710, 15711), class = "Date"), user = structure(c(1L, 
1L, 1L, 1L, 2L, 2L, 2L, 2L), .Label = c(" x", " y"), class = "factor"), 
    items_bought = c(2L, 1L, 3L, 1L, 1L, 5L, 6L, 1L)), .Names = c("date", 
"user", "items_bought"), row.names = c(NA, -8L), class = "data.frame")
alexis_laz
  • 12,884
  • 4
  • 27
  • 37
  • Thanks very much. can you explain what this bit here does cumsum(c(0, diff(x$date)) >= 3), ? – shecode Jun 05 '14 at 10:59
  • OK so this does not work unless it actually sees a 3 day gap in the data. it keeps cumulative summing until then. I'm trying to get a window sum – shecode Jun 05 '14 at 13:48
  • data = structure(list(date = structure(c(15706, 15707, 15708, 15709, 15710, 15711, 15706, 15707, 15708, 15709, 15710, 15711), class = "Date"), user = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c(" x", " y"), class = "factor"), items_bought = c(2L, 1L, 0L, 0L, 3L, 1L, 1L, 1L, 0L, 5L, 6L, 1L)), .Names = c("date", "user", "items_bought"), row.names = c(NA, -12L), class = "data.frame") and the answer should be: 2,3,3,1,3,4,1,2,2,6,11,12 – shecode Jun 05 '14 at 15:13
  • I added some things to the sample data set to be clearer – shecode Jun 05 '14 at 16:02
1

Here is an approach that doesn't use cumsum but a nested lapply instead. The first one goes over the users and then for each user the second lapply constructs the desired data frame by summing all items bought from within the last 2 days of each date. Note that if data$date were not sorted, it would have to be sorted in ascending order first.

data <- structure(list(
    date = structure(c(15706, 15707, 15708, 15709, 15710, 15711, 
        15706, 15707, 15708, 15709, 15710, 15711), class = "Date"), 
    user = c("x", "x", "x", "x", "x", "x", "y", "y", "y", "y", "y", "y"),
    items_bought = c(2L, 1L, 0L, 0L, 3L, 1L, 1L, 1L, 0L, 5L, 6L, 1L)),
    .Names = c("date", "user", "items_bought"),
    row.names = c(NA, -12L),
    class = "data.frame")

do.call(rbind, lapply(unique(data$user),
   function(u) {
       subd <- subset(data, user == u)
       do.call(rbind, lapply(subd$date, 
           function(x) data.frame(date = x, 
               user = u, items_bought = 
               sum(subd[subd$date %in% (x - 2):x, "items_bought"]))))
}))

Edit

To deal with the issue of having several timestamps for each day (more than 1 row per date) I would first aggregate by summing all items bought during at each time in the same day. You can do that e.g. using the built-in function aggregate but if your data is too large you can also use data.table for speed. I'll call your original data frame (with more than 1 row per date) predata and the aggregated one (1 row per date) data. So by calling

predt <- data.table(predata)
setkey(predt, date, user)
data <- predt[, list(items_bought = sum(items_bought)), by = key(predt)]

you get a data frame containing one row per date and columns date, user, items_bought. Now, I think the following way will be faster than the nested lapply above, but I am not sure since I cannot test it on your data. I am using data.table because it is meant to be fast (if used the right way, which I am not sure this is). The inner loop will be replaced by a function f. I do not know if there is a neater way, avoiding this function and replacing the double loop with only one call to data.table, or how to write a data.table call that would execute faster.

library(data.table)
dt <- data.table(data)
setkey(dt, user)
f <- function(d, u) {
    do.call(rbind, lapply(d$date, function(x) data.frame(date = x,
        items_bought = d[date %in% (x - 2):x, sum(items_bought)])))
}
data <- dt[, f(.SD, user), by = user]

Another way, which doesn't use data.table, assuming that you have enough RAM (again, I don't know the size of your data), is to store items bought 1 day before in a vector, then items bought 2 days before in another vector, etc, and to sum them up in the end. Something like

sumlist <- vector("list", 2) # this will hold one vector, which contains items 
    # bought 1 or 2 days ago
for (i in 1:2) {
    # tmpstr will be used to find the items that a given user bought i days ago
    tmpstr <- paste(data$date - i, data$user, sep = "|")
    tmpv <- data$items_bought[
        match(tmpstr, paste(data$date, data$user, sep = "|"))]
    # if a date is not in the original data, assume no purchases
    tmpv[is.na(tmpv)] <- 0
    sumlist[[i]] <- tmpv
}
# finally, add up items bought in the past as well as the present day
data$cum_items_bought_3_days <- 
    rowSums(as.data.frame(sumlist)) + data$items_bought

A final thing I would try would be to parallelize the lapply calls, e.g. by using the function mclapply instead, or by re-writing the code using the parallel functionality of foreach or plyr. Depending on the strength of your PC and the size of the task, this may outperform the data.table single-core performance...

konvas
  • 14,126
  • 2
  • 40
  • 46
  • Thanks for your attempt. It does work, however my data set is quite large and its very slow to implement. Could the performance be improved? Also, my actual data set actually has a lot of timestamps at the seconds interval, is it possible to do this cumulative sum per row where the timestamps/dates lie within the date criteria? (I.e I have more than one row per date) – shecode Jun 06 '14 at 11:20
1

It seems like packages xts and zoo contain functions that do what you want, although you may have the same problems with the size of your actual dataset as with @alexis_laz answer. Using the functions from the xts answer to this question seem to do the trick.

First I took the code from the answer I link to above and made sure it worked for just one user. I include the apply.daily function because I believe from your edits/comments that you have multiple observations for some days for some users - I added an extra line to the toy dataset to reflect this.

# Make dataset with two observations for one date for "y" user
dat <- structure(list(
    date = structure(c(15706, 15707, 15708, 15709, 15710, 15711, 
        15706, 15707, 15708, 15709, 15710, 15711, 15711), class = "Date"), 
    user = c("x", "x", "x", "x", "x", "x", "y", "y", "y", "y", "y", "y", "y"),
    items_bought = c(2L, 1L, 0L, 0L, 3L, 1L, 1L, 1L, 0L, 5L, 6L, 1L, 0L)),
    .Names = c("date", "user", "items_bought"),
    row.names = c(NA, -13L),
    class = "data.frame")

# Load xts package (also loads zoo)
require(xts)

# See if this works for one user
dat1 = subset(dat, user == "y")
# Create "xts" object for use with apply.daily()
dat1.1 = xts(dat1$items_bought, dat1$date)
dat2 = apply.daily(dat1.1, sum)
# Now use rollapply with a 3-day window
# The "partial" argument appears to only work with zoo objects, not xts
sum.itemsbought = rollapply(zoo(dat2), 3, sum, align = "right", partial = TRUE)

I thought the output could look nicer (more like example output from your question). I haven't worked with zoo objects much, but the answer to this question gave me some pointers for putting the info into a data.frame.

data.frame(Date=time(sum.itemsbought), sum.itemsbought, row.names=NULL)

Once I had this worked out for one user, it was straightforward to expand this to the entire toy dataset. This is where speed could become an issue. I use lapply and do.call for this step.

allusers = lapply(unique(dat$user), function(x) {
    dat1 = dat[dat$user == x,]
    dat1.1 = xts(dat1$items_bought, dat1$date)
    dat2 = apply.daily(dat1.1, sum)
    sum.itemsbought = rollapply(zoo(dat2), 3, sum, align = "right", partial = TRUE)
    data.frame(Date=time(sum.itemsbought), user = x, sum.itemsbought, row.names=NULL)
} )
do.call(rbind, allusers)
Community
  • 1
  • 1
aosmith
  • 34,856
  • 9
  • 84
  • 118
1

I like James' answer better, but here's an alternative:

with(data,{
  sapply(split(data,user),function(x){
    sapply(x$date,function(y) sum(x$items_bought[x$date %in% c(y,y-1,y-2)]))
  })
})
userNaN
  • 506
  • 1
  • 5
  • 13
  • Hi. This doesn't really give me the output I need (shown above). thanks for your attempt. it replicates the sums for the same day. but I want to do be in order of the data as demonstrated above – shecode Jun 09 '14 at 09:41
  • assume that we have event number as a column, numbered 1:6 for user x, and 1:6 for user y. So we want the logic to also say, where the event number is less than the current row – shecode Jun 09 '14 at 10:02
  • Just aggregate, sort, and merge your data? – userNaN Jun 09 '14 at 22:01
  • e.g. assign above output to variable "z" and append it to your properly ordered dataset using `c(z[,1],z[,2])`. – userNaN Jun 09 '14 at 23:22