-1

I have a dataframe that kind of looks like this:

    ID    Day  Value
    A     1    1.4
    A     2    3.4
    A     3    5.6
    A     4    6.7
    B     1    2.3
    B     2    3.4
    B     2    3.5
    C     1    2.3
    C     2    4.6
    C     3    6.8

I want to add a column "Difference to previous day". As you can see, not all Objects in the table have values until the last day and sometimes there are multiple measurements for one day (B2), which should be summarized by taking the mean.

What is a good way to do this in R? I'm using a loop at the moment to go through each row and find the matching ones from the previous day, but it takes forever and doesn't feel like you're supposed to do it that way.

Bonus Question: Measurements were not always taken at the same time of day, so in truth the Day column is a float (2.4553 days from the start of the experiment). It's not necessary to normalize the differences in any way since the resulting diff will end up in a plot with Day[Float] at the X axis, but it would be nice if there'd be a solution that doesn't rely on Day - 1 for finding the previous entry.

DP.
  • 581
  • 4
  • 15

2 Answers2

2

Answer using dplyr assuming your dataframe is named 'df':

library(dplyr)

output <- df %>% 
  group_by(ID, Day) %>% 
  summarise(Value = mean(Value)) %>% # ensure only 1 Value per Day/ID
  mutate(PrevValue = lag(value, n=1), # create column that is previous days value                            
         DiffToPreviousDay = PrevValue - Value) # calculate difference

If day is a float and instead of finding the difference from previous day you would like the difference from the previous observation then simply omit the summarise:

library(dplyr)

output <- df %>% 
  group_by(ID, Day) %>% 
  mutate(PrevValue = lag(value, n=1), # create column that is previous days value                            
         DiffToPreviousObservation = PrevValue - Value) # calculate difference
Tom Haddow
  • 230
  • 1
  • 10
1

Maybe you can try the following code, which is a base R solution:

df <- within(df,difference <- ave(ave(Value, ID, Day, FUN = mean),ID, FUN = function(x) c(NA,diff(x))))

such that

> df
   ID Day Value difference
1   A   1   1.4         NA
2   A   2   3.4       2.00
3   A   3   5.6       2.20
4   A   4   6.7       1.10
5   B   1   2.3         NA
6   B   2   3.4       1.15
7   B   2   3.5       0.00
8   C   1   2.3         NA
9   C   2   4.6       2.30
10  C   3   6.8       2.20

DATA

df <- structure(list(ID = c("A", "A", "A", "A", "B", "B", "B", "C", 
"C", "C"), Day = c(1L, 2L, 3L, 4L, 1L, 2L, 2L, 1L, 2L, 3L), Value = c(1.4, 
3.4, 5.6, 6.7, 2.3, 3.4, 3.5, 2.3, 4.6, 6.8)), class = "data.frame", row.names = c(NA, 
-10L))
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81
  • I like this! It's just a bit dangerous for plotting that row 7 has a difference of 0... – DP. Jan 08 '20 at 13:45