11

I've looked extensively on stack overflow for a solution, but have yet to find one that works for me. I have a data frame that looks something like this:

id    time    latitude    longitude
A     11:10   381746.0    6008345
A     11:11   381726.2    6008294
B     10:56   381703.0    6008214
B     10:57   381679.7    6008134
C     4:30    381654.4    6008083
C     4:31    381629.2    6008033

I would like to insert a new row at the END of each id. In this row, I would like 'id' and 'time' to be the same as the previous observation. I would like latitude and longitude to be '394681.4' and '6017550' (corresponding to the end location of all id's).

id    time    latitude    longitude
A     11:10   381746.0    6008345
A     11:11   381726.2    6008294
A     11:11   394681.4    6017550
B     10:56   381703.0    6008214
B     10:57   381679.7    6008134
B     10:57   394681.4    6017550
C     4:30    381654.4    6008083
C     4:31    381629.2    6008033
C     4:32    394681.4    6017550

Can anyone think of a solution? Dplyr or data table solutions preferred.

Splash1199
  • 379
  • 3
  • 14

5 Answers5

15

We can do this with data.table. Convert the 'data.frame' to 'data.table' (setDT(df1)), grouped by 'id', get the last row with tail, assign the 'latitude' and 'longitude' with the new values, rbind with the original dataset and order by 'id'.

library(data.table)
rbind(setDT(df1), df1[, tail(.SD, 1) , by = id
        ][, c("latitude", "longitude") := .(394681.4,  6017550)
         ])[order(id)]
#    id  time latitude longitude
#1:  A 11:10 381746.0   6008345
#2:  A 11:11 381726.2   6008294
#3:  A 11:11 394681.4   6017550
#4:  B 10:56 381703.0   6008214
#5:  B 10:57 381679.7   6008134
#6:  B 10:57 394681.4   6017550
#7:  C  4:30 381654.4   6008083
#8:  C  4:31 381629.2   6008033
#9:  C  4:31 394681.4   6017550

Or using dplyr, with similar methodology

library(dplyr)
df1 %>%
   group_by(id) %>%
   summarise(time = last(time)) %>%
   mutate(latitude = 394681.4, longitude = 6017550) %>% 
   bind_rows(df1, .) %>% 
   arrange(id)
akrun
  • 874,273
  • 37
  • 540
  • 662
4

A base R solution using the split-apply-combine concept.

do.call(rbind, lapply(split(df, df$id), 
                      function(x) rbind(x,
                         within(x[nrow(x),], {latitude <- 394681.4; longitude <- 6017550}))))

which returns

     id  time latitude longitude
A.1   A 11:10 381746.0   6008345
A.2   A 11:11 381726.2   6008294
A.21  A 11:11 394681.4   6017550
B.3   B 10:56 381703.0   6008214
B.4   B 10:57 381679.7   6008134
B.41  B 10:57 394681.4   6017550
C.5   C  4:30 381654.4   6008083
C.6   C  4:31 381629.2   6008033
C.61  C  4:31 394681.4   6017550

split breaks the data.frame into a list of data.frames, lapply rbinds the final row to each data.frame, and do.call rbinds the resulting list of data.frames. The final row of each data.frame is produced using within which returns a modified version of the data.frame it is given. nrow is used to select the final row. referencing @akrun's answer, x[nrow(x),] could be replaced with tail(x, 1).

lmo
  • 37,904
  • 9
  • 56
  • 69
1

Using dplyr

   library(dplyr)

 # data:   
    df <- structure(list(id = structure(c(1L, 1L, 2L, 2L, 3L, 3L), .Label = c("A", 
    "B", "C"), class = "factor"), time = structure(c(3L, 4L, 1L, 
    2L, 5L, 6L), .Label = c("10:56", "10:57", "11:10", "11:11", "4:30", 
    "4:31"), class = "factor"), latitude = c(381746, 381726.2, 381703, 
    381679.7, 381654.4, 381629.2), longitude = c(6008345L, 6008294L, 
    6008214L, 6008134L, 6008083L, 6008033L)), .Names = c("id", "time", 
    "latitude", "longitude"), row.names = c(NA, -6L), class = c("tbl_df", 
    "tbl", "data.frame"))

 # code: 

    df %>% group_by(id) %>% 
        do({ df <- . 
             last_row           <- df %>% slice(n())
             last_row$latitude  <- 394681.4
             last_row$longitude <- 6017550
             df                 <- bind_rows(df, last_row)
        })
Rasmus Larsen
  • 5,721
  • 8
  • 47
  • 79
0

dplyr method

df1 %>%
   group_by(id) %>%
   slice_tail() %>%
   bind_rows(df1) %>% 
   arrange(id,time)
Adarsh N
  • 11
  • 1
0

Another tidyverse option would be a combination of tidyr::complete to add the rows and tidyr::fill to fill missings values in time:

library(tidyr)
library(dplyr)

df |> 
  tidyr::complete(id, latitude = 394681.4, longitude = 6017550) |> 
  select(id, time, everything()) |> 
  arrange(id, time) |> 
  tidyr::fill(time)
#> # A tibble: 9 × 4
#>   id    time  latitude longitude
#>   <fct> <fct>    <dbl>     <dbl>
#> 1 A     11:10  381746    6008345
#> 2 A     11:11  381726.   6008294
#> 3 A     11:11  394681.   6017550
#> 4 B     10:56  381703    6008214
#> 5 B     10:57  381680.   6008134
#> 6 B     10:57  394681.   6017550
#> 7 C     4:30   381654.   6008083
#> 8 C     4:31   381629.   6008033
#> 9 C     4:31   394681.   6017550
stefan
  • 90,330
  • 6
  • 25
  • 51