1

I have a large data set of vehicles recorded at every 0.1 seconds, which looks something like this:

   id frame lane class
1   2    13    1     1
2   2    14    1     1
3   2    15    2     1
4   2    16    2     1
5   4    18    3     3
6   4    19    3     3
7   4    20    3     3
8   5    15    2     2
9   5    16    2     2
10  5    17    2     2
11  5    18    3     2
12  5    19    3     2
13  6    14    1     3
14  6    15    1     3
15  6    16    1     3
16  6    17    2     3
17  6    18    2     3

'frame' is the video recording frame ID, 'lane' is the lane # occupied by vehicle, and 'class' is the vehicle classification i.e. 1=motorcycle, 2=car, 3=truck.

Required output

I want to find the first and last occurrence of a vehicle id and its related data in following four columns:

id 
Origin (lane # in the first occurrence of id) 
Destination (lane # in the last occurrence of id) 
class

What I have tried so far:

(Note that input table is the 'input' data frame)

input$first <- !duplicated(input$'id') 
input$last <- !duplicated(input$'id', fromLast=T)
ODTable <- subset(input, m$'first'==T | m$'last'==T)

I got the following output, which gives me the right information but not in the required format:

ODTable


   id frame lane class first  last
1   2    13    1     1  TRUE FALSE
4   2    16    2     1 FALSE  TRUE
5   4    18    3     3  TRUE FALSE
7   4    20    3     3 FALSE  TRUE
8   5    15    2     2  TRUE FALSE
12  5    19    3     2 FALSE  TRUE
13  6    14    1     3  TRUE FALSE
17  6    18    2     3 FALSE  TRUE
Community
  • 1
  • 1
umair durrani
  • 5,597
  • 8
  • 45
  • 85
  • you might want to consider using `data.table` here as it would be significantly cleaner to filter your data. Works just the same with `shiny` too – Ricardo Saporta Dec 03 '13 at 05:53

1 Answers1

1
library(data.table)
input <- as.data.table(input)


setkey(input, "id")

# First
input[.(unique(id)), mult="first"]
   id frame lane class
1:  2    13    1     1
2:  4    18    3     3
3:  5    15    2     2
4:  6    14    1     3

# Last
input[.(unique(id)), mult="last"]
   id frame lane class
1:  2    16    2     1
2:  4    20    3     3
3:  5    19    3     2
4:  6    18    2     3

Putting it all together:

first <- input[.(unique(id)) , mult="first"]
last <- input[.(unique(id)) ,  mult="last"]

Destination <- copy(first)[last, destin := i.lane]
Destination
   id frame lane class destin
1:  2    13    1     1      2
2:  4    18    3     3      3
3:  5    15    2     2      3
4:  6    14    1     3      2
Ricardo Saporta
  • 54,400
  • 17
  • 144
  • 178
  • your answer is similar to mine except the last two columns and although it gives me the required info but it is not in the right format. The required format is one described in the question. I need the lane # in first row of a vehicle id in 'origin' column and the lane # in second row of the same vehicle id in destination column (refer to 'Required output' in question). For example for vehicle id 2, it should look like this: id=2 origin=1 destination=2 class=1 – umair durrani Dec 03 '13 at 06:06
  • Something like: `input[, list(origin=lane[1],destin=lane[.N],class=class[1]), by="id"]` maybe? – thelatemail Dec 03 '13 at 06:22
  • @Ricardo Thankyou! I got my OD table. thelatemail I'll try that one too. – umair durrani Dec 03 '13 at 06:59