0

I have a dataframe like this:

     starttime     sx      sy        time
       <chr>      <chr>   <chr>      <chr>
1  1416924247145  667.75  824.25 1416924247145
2  1416924247145 667.875  824.25 1416924247158
3  1416924247145   668.5   824.5 1416924247198
4  1416924257557  231.25  602.25 1416924257557
5  1416924257557 230.625  602.25 1416924257570
6  1416924257557 229.625 601.875 1416924257597
7  1416924257557  228.75  601.25 1416924257610
8  1416924257557   227.5   600.0 1416924257623
9  1416924257557 216.875  587.75 1416924257717
10 1416924257557 207.125 572.625 1416924257797
11 1416924257600 525.425 525.636 1416924259999

I want a subset of this dataframe only containing the rows with the first and last element with equal starttimes. In this example these rows would be 1,3,4,10 and 11. Important is, that the first and last rows also are included. I try to do this with the dplyr package, because it looks suitable for this. I made use of group_by(), filter(), first() and last() functions, but I couldn't get the result I wanted. This is how the result should look like:

 starttime     sx      sy        time
       <chr>      <chr>   <chr>      <chr>
1  1416924247145  667.75  824.25 1416924247145
3  1416924247145   668.5   824.5 1416924247198
4  1416924257557  231.25  602.25 1416924257557
10 1416924257557 207.125 572.625 1416924257797
11 1416924257600 525.425 525.636 1416924259999
Flugmango
  • 113
  • 2
  • 11
  • I recommend reading Hadley's guide to window functions using `dplyr`. https://cran.r-project.org/web/packages/dplyr/vignettes/window-functions.html – Andrew Brēza Aug 01 '16 at 16:20
  • It is easier to Google sometimes rather spending time on writing a whole new question that adds no value whatsoever. See also [this](http://stackoverflow.com/questions/8203818/how-to-select-the-first-and-last-row-within-a-grouping-variable-in-a-data-frame), and [this](http://stackoverflow.com/questions/19451032/r-returning-first-row-of-group) and [this](http://stackoverflow.com/questions/31833429/subset-by-first-and-last-value-per-group) – David Arenburg Aug 01 '16 at 16:31

2 Answers2

2

One of the ways to do this using dplyr:

library(dplyr)
df %>% group_by(starttime) %>% slice(unique(c(1, n())))

#Source: local data frame [5 x 4]
#Groups: starttime [3]
#
#     starttime      sx      sy         time
#         <dbl>   <dbl>   <dbl>        <dbl>
#1 1.416924e+12 667.750 824.250 1.416924e+12
#2 1.416924e+12 668.500 824.500 1.416924e+12
#3 1.416924e+12 231.250 602.250 1.416924e+12
#4 1.416924e+12 207.125 572.625 1.416924e+12
#5 1.416924e+12 525.425 525.636 1.416924e+12

Or using data.table:

library(data.table)
setDT(df)[, .SD[unique(c(1,.N))], starttime]

Data

structure(list(starttime = c(1416924247145, 1416924247145, 1416924247145, 
1416924257557, 1416924257557, 1416924257557, 1416924257557, 1416924257557, 
1416924257557, 1416924257557, 1416924257600), sx = c(667.75, 
667.875, 668.5, 231.25, 230.625, 229.625, 228.75, 227.5, 216.875, 
207.125, 525.425), sy = c(824.25, 824.25, 824.5, 602.25, 602.25, 
601.875, 601.25, 600, 587.75, 572.625, 525.636), time = c(1416924247145, 
1416924247158, 1416924247198, 1416924257557, 1416924257570, 1416924257597, 
1416924257610, 1416924257623, 1416924257717, 1416924257797, 1416924259999
)), .Names = c("starttime", "sx", "sy", "time"), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11"))
Sumedh
  • 4,835
  • 2
  • 17
  • 32
0

We can do this with base R

i1 <- with(df, as.logical(ave(starttime, starttime, 
      FUN = function(x) seq_along(x) %in% range(seq_along(x)))))
df[i1,]
#      starttime      sx      sy          time
#1  1416924247145 667.750 824.250 1416924247145
#3  1416924247145 668.500 824.500 1416924247198
#4  1416924257557 231.250 602.250 1416924257557
#10 1416924257557 207.125 572.625 1416924257797
#11 1416924257600 525.425 525.636 1416924259999
akrun
  • 874,273
  • 37
  • 540
  • 662