I want to extract the first and last row of data, within each group, in a data frame in R. I have a long list of data (~300,000 observations) with a couple of thousand groups. For each group, I want the first and last observation (In this case I am extracting the first and last latitude/longitude for a couple of thousand survey transects).
I came up with a for-loop solution that may work: I subset the data one group at a time, but wanted to see if there were cleaner ways to go about this problem:
library(tidyverse)
#example survey data along CA coastline
example.data = data.frame(group = c(rep('A',20),rep('B',20),rep('C',20)),
latitude = seq(32,38, length.out = 60), #N samples, mean, sd
longtitude = seq(-119,-122,length.out = 60))
head(example.data)
This looks like:
group latitude longtitude
A 32.00000 -119.0000
A 32.10169 -119.0508
A 32.20339 -119.1017
A 32.30508 -119.1525
A 32.40678 -119.2034
Here was my solution using for-loops:
#find groups (i.e. transects)
letter.levels = levels(example.data$group)
first_last = c()
for(i in 1:length(letter.levels)){
d = filter(example.data, group == letter.levels[i])
d.len = length(d[,1])
first = d[1,]
last = d[d.len,]
first_last = rbind(first,last,first_last)
}
#view results
first_last
The final results I'm looking for would be this (Start/stop locations for each survey transect):
group latitude longtitude
C 36.0678 -121.0339
C 38.0000 -122.0000
B 34.0339 -120.0169
B 35.9661 -120.9831
A 32.0000 -119.0000
A 33.9322 -119.9661
Could there be a cleaner dplyr version of this that I missed? If nothing else, I can always fall back on this for-loop version.
I searched for help and found: somewhat related question and another(but different) for-loop suggestion