How can I identify the first/last observations within a group in R?

Question

I want to extract the first and last row of data, within each group, in a data frame in R. I have a long list of data (~300,000 observations) with a couple of thousand groups. For each group, I want the first and last observation (In this case I am extracting the first and last latitude/longitude for a couple of thousand survey transects).

I came up with a for-loop solution that may work: I subset the data one group at a time, but wanted to see if there were cleaner ways to go about this problem:

library(tidyverse) 


#example survey data along CA coastline

example.data = data.frame(group = c(rep('A',20),rep('B',20),rep('C',20)),
                   latitude = seq(32,38, length.out = 60),  #N samples, mean, sd
                   longtitude = seq(-119,-122,length.out = 60)) 

head(example.data)

This looks like:

group latitude longtitude
    A 32.00000  -119.0000
    A 32.10169  -119.0508
    A 32.20339  -119.1017
    A 32.30508  -119.1525
    A 32.40678  -119.2034

Here was my solution using for-loops:

#find groups (i.e. transects)
letter.levels = levels(example.data$group)

first_last = c()

for(i in 1:length(letter.levels)){
  d = filter(example.data, group == letter.levels[i])
  d.len = length(d[,1])
  first = d[1,]
  last = d[d.len,]

  first_last = rbind(first,last,first_last)
}

#view results
first_last

The final results I'm looking for would be this (Start/stop locations for each survey transect):

group latitude longtitude
    C  36.0678  -121.0339
    C  38.0000  -122.0000
    B  34.0339  -120.0169
    B  35.9661  -120.9831
    A  32.0000  -119.0000
    A  33.9322  -119.9661

Could there be a cleaner dplyr version of this that I missed? If nothing else, I can always fall back on this for-loop version.

I searched for help and found: somewhat related question and another(but different) for-loop suggestion

I agree, @SymbolixAU But, that question specifies dplyr only solutions? — Hector Haffenden, Apr 01 '19 at 23:11
`dplyr` is part of the `tidyverse` the OP has loaded. There is also a faster `data.table` solution, and a base option in the answers. — SymbolixAU, Apr 01 '19 at 23:12
I think it's a candidate. The only thing stopping me marking it as such is I can't see a way of knowing the 'order' of data in the groups. — SymbolixAU, Apr 01 '19 at 23:15
possibly #2: https://stackoverflow.com/questions/8203818/how-to-select-the-first-and-last-row-within-a-grouping-variable-in-a-data-frame — markus, Apr 01 '19 at 23:20
Why `example.data %>% group_by(group) %>% slice(c(1, n())) ` doesn't solve your problem ? — Ronak Shah, Apr 01 '19 at 23:20
I think it's just the result they want at the end, in terms of ordering @RonakShah But yes, this question could, and probably should be converted to, using the above links, then a simple reshaping task... — Hector Haffenden, Apr 01 '19 at 23:23
just reverse order? `example.data %>% group_by(rev(group)) %>% slice(c(1, n())) ` ? — Ronak Shah, Apr 01 '19 at 23:24
Thank you @SymbolixAU - I think that will be the solution I was looking for - though there are other great ideas within the comments as well — Kodiakflds, Apr 02 '19 at 15:29

How can I identify the first/last observations within a group in R?

0 Answers0