0

I have a lot of data in which I have 5 variables : The subject, the date, the date+the hour, the measure which is a concentration and the feeding.

So for each subject, we took some measurements from the date+hour(1) to the date+hour(n). So we have n measurements for each subject. What I would like to do is to calculate the time of record for each row by doing for each subject date+hour[i]-date+hour1. So for that, I did a loop. It was working well until I realise that I have for each subject several days of record. So it means that I have to calculate for each subject and each date, the time of record.

That is my script:

    getwd()
    setwd("H:/OptiMIR LMD files/week1")

    Week1<-read.csv("week1.csv", header=T)
    head(Week1)
    colnames(Week1)<-c("CowID","Date", "DateHour","Measure","Feeding")
    head(Week1)


    #Association colums with class
    Week1$CowID<-as.factor(Week1$CowID)
    Week1$Date<-as.Date(Week1$Date, format = "%d/%m/%Y")
    Week1$DateHour<-strptime(Week1$DateHour, format = "%Y/%m/%d/%H:%M:%S")
    Week1$Measure<-as.numeric(as.vector(Week1$Measure))
    Week1$Feeding<-as.factor(Week1$Feeding)
    str(Week1)

    summary(Week1)
    unique(Week1$CowID) 

    #Calculate Time of measure
    library(lubridate)
    library(foreach)

    Time<-c()
    #nrow(LMD)
    for (i in 1:nrow(Week1)) {
      for (j in unique(Week1$CowID)) {
        for (k in unique(Week1$Date)) {
          if (Week1$CowID[i]==j & Week1$Date[i]==k) {
            foreach(unique(Week1$CowID) & unique(Week1$Date))
            Time[i]<-c(difftime(Week1[i,3], Week1[match(k,Week1$Date),3], units="secs"))
          }
        }
      }
    }

    Week1<-cbind(Week1,Time)​

Here's head and summary:

> head(Week1)
  CowID       Date            DateHour Measure Feeding
1  1990 2014-01-13 2014-01-13 16:21:02     119    hoko
2  1990 2014-01-13 2014-01-13 16:21:02     116    hoko
3  1990 2014-01-13 2014-01-13 16:21:03     111    hoko
4  1990 2014-01-13 2014-01-13 16:21:03      77    hoko
5  1990 2014-01-13 2014-01-13 16:21:04      60    hoko
6  1990 2014-01-13 2014-01-13 16:21:04      65    hoko​

> summary(Week1)
     CowID            Date               DateHour                  
 2239   : 1841   Min.   :2014-01-13   Min.   :2014-01-13 14:33:05  
 2067   : 1816   1st Qu.:2014-01-13   1st Qu.:2014-01-13 16:10:14  
 2246   : 1797   Median :2014-01-14   Median :2014-01-14 15:10:51  
 2062   : 1792   Mean   :2014-01-13   Mean   :2014-01-14 14:55:45  
 2248   : 1757   3rd Qu.:2014-01-15   3rd Qu.:2014-01-15 14:32:59  
 2171   : 1738   Max.   :2014-01-15   Max.   :2014-01-15 15:55:09  
 (Other):14259                                                     
    Measure        Feeding     
 Min.   :   4.0   hoko :16857  
 1st Qu.:  65.0   strap: 8143  
 Median : 108.0                
 Mean   : 147.4                
 3rd Qu.: 185.0                
 Max.   :1521.0              ​

So for 1990, I will have other dates of record. And that's my problem because this loop :

Time<-c()
for (i in 1:nrow(Week1) {
  for (j in unique(Week1$CowID)) {
    for (k in min(Week1$Date):max(Week1$Date)) {
      if ((week1$CowID[i]==j) & (Week1$Date[i]==k)) {
        Time[i]<-c(difftime(Week1[i,3], Week1[match(k, Week1$Date),3], units="secs"))
      }
    }
  }
}

works when I have one day of measure / subject. but now I have several days of record, it works for one subject but when it comes to another subject, I have negative time of records...

I think I know where is the problem : in the loop, "for k...". I have to tell R that he has to look at one date FOR each unique subject. But I don't know how to do that

Thanks

help-info.de
  • 6,695
  • 16
  • 39
  • 41
Marie
  • 45
  • 1
  • 10
  • These loops are the hard way to do this. The easy way is `dplyr` or `data.table`. Using `dplyr`, I think what you want is `group_by(Week1, CowID, Date) %>% mutate(Time = DateHour - min(DateHour))`, but it's hard to know for sure. Can you show your desired output for the `head` of data that you show? – Gregor Thomas Oct 13 '16 at 16:40
  • Okay.. I'm gonna investigate on that... The first values I can obtain for the vector Time are : [1] 0 0 1 1 2 2 3 4 4 5 5 6 [13] 6 7 7 8 8 9 9 10 10 11 11 12 [25] 12 13 13 14 15 15 16 16 17 17 18 18 [37] 19 19 20 20 21 21 22 22 23 23 24 24 This is correct but when it's a new subject (cowID), it gives wrong result like it didn't take into account that it was another CowID – Marie Oct 13 '16 at 16:46
  • If you want to fix your loop code, I think the biggest problem is that your outermost loop is over all the rows. You're sort of working around it with the `match`, but the more natural way to do it with loops would be to use the groups as the outer loops and then the innermost loop go over each row in the group. – Gregor Thomas Oct 13 '16 at 16:49
  • Don't put output in comments, it's difficult to make sense of. Edit it into your question (preferably adding it to the data frame) so that we can see. You may also want to look at [tips for making reproducible examples](http://stackoverflow.com/q/5963269/903061). This is a pretty good question, but would be even better if your data was shared reproducibly, something like `dput(droplevels(head(Week1, 10)))` - or some other small subset that has a couple cows and a couple days, enough to illustrate the problem. The `dput()` output looks ugly, but it's copy/pasteable into R to recreate your data. – Gregor Thomas Oct 13 '16 at 16:52
  • Okay thank you, I'm gonna update that. I tried the group_by and an error appears : "Error in eval(expr, envir, enclos) : column 'DateHour' has unsupported class : POSIXlt, POSIXt" – Marie Oct 13 '16 at 18:31
  • For the `dplyr` to work you will need to convert `POSIXlt` to `POSIXct` [as shown here](http://stackoverflow.com/q/30063190/903061). – Gregor Thomas Oct 13 '16 at 18:35
  • Wow, it works perfectly and it's super fast ! The data I'm working on have 130 000 observations, hope it will be that fast ! Thanks. That was super simple! – Marie Oct 13 '16 at 18:47

1 Answers1

0

For loops are a poor way to do operations by group in R. data.table and dplyr provide faster and friendlier alternatives:

library(dplyr)
group_by(Week1, CowID, Date) %>% 
    mutate(Time = DateHour - min(DateHour))

Note that if your datetime columns are POSIXlt class then you will need to first convert to POSIXct with as.POSIXct().

Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
  • Thanks! But I have a problem in my str(Week1), there're a lot of new terms that appear (See in the post) – Marie Oct 14 '16 at 09:10