Trying to loop through a dataframe

Question

I am trying to calculate the total activity driver using GPS data. I've written a loop that is intended to calculate the difference in time between two consecutive points in a dataframe over the range of values, summing it as it goes.

However, the final output is much smaller than would be expected, in the order of seconds instead of hundreds of hours, which leads me to believe that it is only looping a few times or not summing the values correctly. My programming knowledge is mostly from Python, am I implementing this idea correctly in R or could I write it better? My data looks something like this:

DriveNo       Date.and.Time Latitude Longitude
1     264 2014-02-01 12:12:05 41.91605  12.37186
2     264 2014-02-01 12:12:05 41.91605  12.37186
3     264 2014-02-01 12:12:12 41.91607  12.37221
4     264 2014-02-01 12:12:27 41.91619  12.37365
5     264 2014-02-01 12:12:42 41.91627  12.37490
6     264 2014-02-01 12:12:57 41.91669  12.37610

Is there a way I can save the result of each iteration to a list so that I could analyse where in the range of values a problem might be occurring?

datelist = taxi_264$Date.and.Time
dlstandard = as.POSIXlt(datelist)
diffsum = 0
for (i in range(1:83193))
{
  diff = difftime(dlstandard[i], dlstandard[(i+1)], units = "secs")
  diffsum = diffsum + diff
}

Welcome to Stack Overflow. Please [make this question reproducible](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) by including some example data in a plain text format. For example, enough rows from `dlstandard` to generate some output. — neilfws, Apr 20 '21 at 01:26
If you are summing up the difference between each row then just subtract the first row from the last. — Dave2e, Apr 20 '21 at 01:59

Tim Biegeleisen · Answer 1 · 2021-04-20T02:00:05.857

0

You could avoid the loop by using the lead() function from dplyr:

library(dplyr)

diff <- difftime(dlstandard, lead(dlstandard, 1, defaultValue=dlstandard), units="secs")
diffsum <- sum(diff)

Note that the above is a vectorized way of solving your problem, and is usually the way to go when using R.

edited Apr 20 '21 at 02:00

answered Apr 20 '21 at 01:28

Tim Biegeleisen

502,043
27
286
360

So the 'lead()' function removes the need for the i and i+1 structure, but does this code still compute the difference over the entire range of consecutive values or just once? Using this gives me an NA answer for some reason. – FrankYaygrrr Apr 20 '21 at 01:50
@FrankYaygrrr The problem could be happening because for the final entry in `dlstandard`, the lead would return `na`. I have updated my code such that `lead` will now return the same value as the last value, such that the diff would be zero and would not contribute to the sum. – Tim Biegeleisen Apr 20 '21 at 02:00

score 0 · Accepted Answer · answered Apr 20 '21 at 01:53

0

You can try :

diffsum <- as.numeric(sum(difftime(tail(dlstandard, -1), 
                                   head(dlstandard, -1), units = 'secs')))

This will give diffsum as sum of the differences in seconds.

answered Apr 20 '21 at 01:53

Ronak Shah

377,200
20
156
213

This works and gives me an answer roughly equivalent to 28 days, which is the time over which the study took place, whereas I am looking for just the time over which the taxi is actively being driven. Is there a way I could modify this code to disregard time difftime values larger than say 1 minute such that they are not added to the total diffsum? Thankyou – FrankYaygrrr Apr 20 '21 at 02:26
@FrankYaygrrr Try `diffsum <- as.numeric(sum(Filter(function(x) x <= 60, difftime(tail(dlstandard, -1), head(dlstandard, -1), units = 'secs'))))` – Ronak Shah Apr 20 '21 at 02:42

Trying to loop through a dataframe

2 Answers2