keep the first timestamp of a specific list

Question

Having a dataframe like this:

dframe <- structure(list(id = c(1L, 1L, 1L, 1L, 2L, 2L, 2L), name = c("Google", 
    "Google", "Yahoo", "Amazon", "Amazon", "Google", "Amazon"), date = c("2008-11-01", 
    "2008-11-02", "2008-11-01", "2008-11-04", "2008-11-01", "2008-11-02", 
    "2008-11-03")), class = "data.frame", row.names = c(NA, -7L))

And a list with names

list <- c("Google", "Yahoo", "Amazon")

How can I have an output like this:

id   name       date
1 Google 2008-11-01
1  Yahoo 2008-11-01
1 Amazon 2008-11-04
2 Amazon 2008-11-01
2 Google 2008-11-02

For every id keep from the list the first date. I tried this one:

library(data.table)
library(tidyverse)
library(reshape2)
library(zoo)
date_list_first= dframe[,head(.SD,1), by = .(id)]

`dframe %>% group_by(id, name) %>% slice(1L) ` using `dplyr` — Ronak Shah, Sep 27 '19 at 09:15
Does the column `name` only has the names from the list or are there others not included in your `list`. Note..don't name your objects with predefined functions. `list` is a base R function — Sotos, Sep 27 '19 at 09:22
Maybe something like `dframe[(!duplicated(dframe[c("id","name")])) & dframe$name %in% list,]` if you need to remove dups and only keep those in the specified list. — thelatemail, Sep 27 '19 at 09:25

webb · Accepted Answer · 2019-10-09T11:46:04.257

2

Here's how using data.table:

library(data.table)
setDT(dframe)
date_list_first = dframe[order(date)][!duplicated(id,by=c('name','id'))]

edited Oct 09 '19 at 11:46

answered Sep 27 '19 at 09:18

webb

4,180
1
17
26

Thank you but it is not as the expected output. I want to keep all from list from their first date – Nathalie Sep 27 '19 at 09:42
Ah, now I understand: for each combination of id and name, you want to keep the row with the earliest date. Fixed. – webb Oct 09 '19 at 11:46

keep the first timestamp of a specific list

1 Answers1

Linked