5

I have two data frames that I need to subtract the same columns per time and store the results in a different data frame:

dput(t)

structure(list(time = structure(c(2L, 1L, 3L), .Label = c("1/13/15 1:18 PM", 
"1/13/15 12:18 PM", "1/13/15 2:18 PM"), class = "factor"), web01 = c(24083L, 
24083L, 24083L), web03 = c(24083L, 24083L, 24083L)), .Names = c("time", 
"web01", "web03"), class = "data.frame", row.names = c(NA, -3L
))

dput(d)

structure(list(time = structure(c(2L, 1L, 3L), .Label = c("1/13/15 1:18 PM", 
"1/13/15 12:18 PM", "1/13/15 2:18 PM"), class = "factor"), web01 = c(7764.8335, 
7725, 7711.5), web03 = c(10885.5, 10582.333, 10104.5)), .Names = c("time", 
"web01", "web03"), class = "data.frame", row.names = c(NA, -3L
))

Data frame t and d are just sample, my actual data frames have 20 columns. Data frame t and d in this case have the same column names and time will the same for each row for both data frames.

I need to subtract d from d for the same time period and store the result in a different data frame. Any ideas how I could do this in R

user1471980
  • 10,127
  • 48
  • 136
  • 235

3 Answers3

10

Update

rbind_list and rbind_all have been deprecated. Instead use bind_rows.

Based on discussions in comments and inspired by Andrew's answer:

library(dplyr)
df <- bind_rows(d,t) %>% 
  group_by(time = as.POSIXct(time, format="%m/%d/%Y %I:%M %p")) %>%
  summarise_each(funs(diff(.))) %>% 
  data.frame()

This will keep time in a chronological order and convert the result in a regular data.frame()

Steven Beaupré
  • 21,343
  • 7
  • 57
  • 77
  • 1
    Thanks Steven! I was starting to edit my post to reflect your additions, but this works better. – Andrew Taylor Jan 23 '15 at 20:23
  • 1
    If you wanted to save a few key strokes, you could create the time grouping column inside the group_by and remove the mutate step before. `group_by(time = as.POSIXct(time, format="%m/%d/%Y %I:%M %p"))` – talat Jan 24 '15 at 18:50
  • @docendodiscimus Thanks for the suggestion. I edited the answer accordingly. – Steven Beaupré Jan 24 '15 at 18:57
  • I think you are right, @Arun. The data.frame() call is probably not required either – talat Jan 24 '15 at 22:33
  • @docendodiscimus the `data.frame()` call was only because OP wanted a regular dataframe as a result. – Steven Beaupré Jan 25 '15 at 04:02
3

Here's a data.table approach:

library(data.table)
rbindlist(list(d,t))[, lapply(.SD, diff),
                 by = .(time = as.POSIXct(time, format="%m/%d/%y %I:%M %p"))]

#                  time    web01    web03
#1: 2015-01-13 12:18:00 16318.17 13197.50
#2: 2015-01-13 13:18:00 16358.00 13500.67
#3: 2015-01-13 14:18:00 16371.50 13978.50

Edit: corrected date format and output, removed .SDcols = ... .

talat
  • 68,970
  • 21
  • 126
  • 157
2

Using dplyr:

newdata<-
  rbind_list(d,t) %>%
  group_by(time) %>%
  summarise_each(funs(diff(.)))



              time    web01    web03
1  1/13/15 1:18 PM 16358.00 13500.67
2 1/13/15 12:18 PM 16318.17 13197.50
3  1/13/15 2:18 PM 16371.50 13978.50
Andrew Taylor
  • 3,438
  • 1
  • 26
  • 47
  • as I stated, my actual data frames have 21 columns. when I do newdata, it says something like this: ource: local data frame [168 x 21] , variables are not shown. How could I see the whole newdata data frame? – user1471980 Jan 23 '15 at 20:07
  • Note that this will change the order of time. May I suggest: `df <- rbind_list(d,t) %>% mutate(time = as.POSIXct(time, format="%m/%d/%Y %I:%M %p")) %>% group_by(time) %>% summarise_each(funs(diff(.))) %>% arrange(time)` – Steven Beaupré Jan 23 '15 at 20:07
  • @user1471980 You see this because `newdata` is in a `tbl_df`. See http://stackoverflow.com/questions/23188900/view-entire-dataframe-when-wrapped-in-tbl-df – Steven Beaupré Jan 23 '15 at 20:09
  • @StevenBeaupré, is there a way to convert this back to regular data frame? – user1471980 Jan 23 '15 at 20:12