7

I would appreciate some help with the following task: From the data frame below (C), for each id I would like to subtract the first entry under column d_2 from the final entry and then store the results in another dataframe containing the same ids. I can then merge this with my initial dataframe. Pls note that the subtraction has to be in this order (last entry minus first entry for each id).

Here are the codes:

id <- c("A1", "A1", "B10","B10", "B500", "B500", "C100", "C100", "C100", "D40", "D40", "G100", "G100")

d_1 <- c( rep(1.15, 2), rep(1.44, 2), rep(1.34, 2), rep(1.50, 3), rep(1.90, 2), rep(1.59, 2))

set.seed(2)

d_2 <- round(runif(13, -1, 1), 2)

C <- data.frame(id, d_1, d_2)

id   d_1   d_2
A1   1.15 -0.63
A1   1.15  0.40
B10  1.44  0.15
B10  1.44 -0.66
B500 1.34  0.89
B500 1.34  0.89
C100 1.50 -0.74
C100 1.50  0.67
C100 1.50 -0.06
D40  1.90  0.10
D40  1.90  0.11
G100 1.59 -0.52
G100 1.59  0.52

Desired result:

id2 <- c("A1", "B10", "B500", "C100", "D40", "G100")

difference <- c(1.03, -0.81, 0, 0.68, 0.01, 1.04)

diff_df <- data.frame(id2, difference)

id2    difference
A1        1.03
B10      -0.81
B500      0.00
C100      0.68
D40       0.01
G100      1.04

I attempted this by using ddply to obtain the first and last entries but I'm really struggling with indexing the "function argument" in the second code (below) to get the desired outcome.

C_1 <- ddply(C, .(id), function(x) x[c(1, nrow(x)), ])

ddply(C_1, .(patient), function )

To be honest, I'm not very familiar with the ddply package-I got the code above from another post on stack exchange .

My original data is a groupedData and I believe another way of approaching this is using gapply but again I'm struggling with the third argument here (usually a function)

grouped_C <- groupedData(d_1 ~ d_2 | id, data = C, FUN = mean, labels = list( x = "", y = ""), units = list(""))

x1 <- gapply(grouped_C, "d_2", first_entry)

x2 <- gapply(grouped_C, "d_2", last_entry)

where first_entry and last_entry are functions to help me get the first and and last entries. I can then get the difference with: x2 - x1. However, I'm not sure what to input as first_entry and last_entry in the above codes (perhaps to do with head or tail ?).

Any help would be much appreciated.

Community
  • 1
  • 1
John_dydx
  • 951
  • 1
  • 14
  • 27

2 Answers2

14

This can be done easily with dplyr. The last and first functions are very helpful for this task.

library(dplyr)               #install the package dplyr and load it into library 

diff_df <- C %>%             #create a new data.frame (diff_df) and store the output of the following operation in it. The %.% operator is used to chain several operations together but you dont have to reference the data.frame you are using each time. so here we are using your data.frame C for the following steps
  group_by(id) %>%            #group the whole data.frame C by id
  summarize(difference = last(d_2)-first(d_2))     #for each group of id, create a single line summary where the first entry of d_2 (for that group) is subtracted from the last entry of d_2 for that group

#    id difference             #this is the result stored in diff_df
#1   A1       1.03
#2  B10      -0.81
#3 B500       0.00
#4 C100       0.68
#5  D40       0.01
#6 G100       1.04

Edit note: updated post with %>% instead of %.% which is deprecated.

talat
  • 68,970
  • 21
  • 126
  • 157
  • Thanks for your answer, much appreciated. Do you mind explaining the syntax and the various parts of the code? etc. Its my first time using this package. – John_dydx May 13 '14 at 14:56
  • 1
    @John Sure. I added comments to explain the operations in my answer. For more details on `dplyr`, have a look at this introduction to the package (http://cran.rstudio.com/web/packages/dplyr/vignettes/introduction.html) – talat May 13 '14 at 15:02
  • @ beginneR-cool name by the way! Thank you so much, the comments were very helpful but I'll also go through the documentation. Thanks! – John_dydx May 13 '14 at 15:07
  • @docendo Warning messages: 1: '%.%' is deprecated. Use '%>%' instead. – Anish Jan 06 '16 at 17:11
1

If you have any singletons and they need to be left alone, then this will solve your problem. It's the same as docendo discimus's answer, but with an if-else component to deal with the singleton cases:

library(dplyr)               
diff_df <- C %>%             
   group_by(id) %>%
   summarize(difference = if(n() > 1) last(d_2) - first(d_2) else d_2)
lebelinoz
  • 4,890
  • 10
  • 33
  • 56