-1

i am working with csv file and i have a column with name "statistics_lastLocatedTime" as shown in csv file image i would like to subtract second row of "statistics_lastLocatedTime" from first row; third row from second row and so on till the last row and then store all these differences in a separate column and then combine this column to the other related columns as shown in the code given below:

##select related features
data <- read.csv("D:/smart tech/store/2016-10-11.csv")
(columns <- data[with(data, macAddress == "7c:11:be:ce:df:1d" ),
    c(2,10,11,38,39,48,50) ])
write.csv(columns, file = "updated.csv", row.names = FALSE)

## take time difference 
date_data <- read.csv("D:/R/data/updated.csv")
(dates <- date_data[1:40, c(2)])
NROW(dates)
for (i in 1:NROW(dates)) {
  j <- i+1
  r1 <- strptime(paste(dates[i]),"%Y-%m-%d %H:%M:%S")
  r2 <- strptime(paste(dates[j]),"%Y-%m-%d %H:%M:%S")
  diff <- as.numeric(difftime(r1,r2))
  print (diff)
}

## combine time difference with other related columns
combine <- cbind(columns, diff)
combine

now the problem is that i am able to get the difference of rows but not able to store these values as a column and then combine that column with other related columns. please help me. thanks in advance.

smci
  • 32,567
  • 20
  • 113
  • 146
  • Why are you filtering your `date_data` to first 40 rows? This will not be able to cbind back to `columns`. – Parfait Nov 20 '16 at 01:33
  • actually total number of rows in macAddress == "7c:11:be:ce:df:1d" are only 40 and it doesn't metter if i use (dates <- date_data[1:40, c(2)]) or (dates <- date_data[, c(2)]). commands will give the same output i.e. all 40 rows where macAddress == "7c:11:be:ce:df:1d" – Waqas Bari Nov 20 '16 at 01:41
  • You want to calculate the diff in `statistics_lastLocatedTime`, and store that vector back in the dataframe. – smci Nov 20 '16 at 07:31
  • There's no need for the for-loop, `diff()` can operate on vectors. There's no need to repeatedly convert each pair of datetimes from string into datetime, just read them in as datetimes with `read.csv()` – smci Nov 20 '16 at 07:38
  • Related: [Specify custom Date format for colClasses argument in read.table/read.csv](http://stackoverflow.com/questions/13022299/specify-date-format-for-colclasses-argument-in-read-table-read-csv) – smci Nov 20 '16 at 09:13

2 Answers2

1

Consider directly assigning the diff variable using vapply. Also, there is no need for the separate date_data df as all operations can be run on the columns df. Notice too the change in time format to align to the format currently in dataframe:

columns$diff <- vapply(seq(nrow(columns)), function(i){      
  r1 <- strptime(paste(columns$statistics_lastLocatedTime[i]),"%d-%m-%y %H:%M")
  r2 <- strptime(paste(columns$statistics_lastLocatedTime[i+1]),"%d-%m-%y %H:%M")

  diff <- difftime(r1, r2)      
}, numeric(1))
Parfait
  • 104,375
  • 17
  • 94
  • 125
1

This is a four-liner:

  1. Define a custom class 'myDate', and a converter function for your custom datetime, as per Specify custom Date format for colClasses argument in read.table/read.csv
  2. Read in the datetimes as actual datetimes; no need to repeatedly convert later.
  3. Simply use the vectorized diff operator on your date column (it sees their type, and automatically dispatches a diff function for POSIXct Dates). No need for for-loops:

.

setClass('myDate') # this is not strictly necessary
setAs('character','myDate', function(from) { 
  as.POSIXct(from, format='%d-%m-%y %H:%S', tz='UTC') # or whatever timezone
})

data <- read.csv("D:/smart tech/store/2016-10-11.csv",
            colClasses=c('character','myDate','myDate','numeric','numeric','integer','factor'))
# ...
data$date_diff <- c(NA, diff(data$statistics_lastLocatedTime))

Note that diff() produces a result of length one shorter than vector that we diff'ed. Hence we have to pad it (e.g. with a leading NA, or whatever you want).

Community
  • 1
  • 1
smci
  • 32,567
  • 20
  • 113
  • 146
  • Great answer! I'm adding `setAs()` to my library. However, you may need to add `setClass('myDate')` else the error: *no definition for class “myDate”*. And how did I miss `diff()`? Be sure to fix the ending quote of *integer* in colClasses. – Parfait Nov 20 '16 at 15:45
  • @Parfait no we don't need to add `setClass('myDate')`. It works fine without – smci Nov 20 '16 at 21:16
  • 1
    Turns out it is a warning and not an error, but still processes as this question indicates: http://stackoverflow.com/questions/14146341/using-setas-to-specify-colclasses-in-r – Parfait Nov 20 '16 at 22:14
  • @smci : thanks a lot for your help. 1) your closing bracket was missing in read.csv 2) my output was NA and when i removed format in as.POSIXct it worked fine – Waqas Bari Nov 22 '16 at 07:04
  • @WaqasBari: thanks but I don't understand how removing the format option changes anything; `format(NA...)` is still NA – smci Nov 22 '16 at 07:48
  • @WaqasBari by the way, `lubridate` package has much more user-friendly date-conversion fns. – smci Nov 22 '16 at 07:49
  • @smci i hadn't changed (format (NA...)) in your code but i got accurate results when i removed in it my code and use only as.POSIXct(from) . thanks for informing me about lubridate package. i used "date" and "chron" packages – Waqas Bari Nov 22 '16 at 08:06
  • Yeah, as.POSIXct is a pain. – smci Nov 22 '16 at 10:30