0

Having a dataframe like this:

dframe <- structure(list(id = c(1L, 1L, 1L, 1L, 2L, 2L, 2L), name = c("Google", 
    "Google", "Yahoo", "Amazon", "Amazon", "Google", "Amazon"), date = c("2008-11-01", 
    "2008-11-02", "2008-11-01", "2008-11-04", "2008-11-01", "2008-11-02", 
    "2008-11-03")), class = "data.frame", row.names = c(NA, -7L))

And a list with names

list <- c("Google", "Yahoo", "Amazon")

How can I have an output like this:

id   name       date
1 Google 2008-11-01
1  Yahoo 2008-11-01
1 Amazon 2008-11-04
2 Amazon 2008-11-01
2 Google 2008-11-02

For every id keep from the list the first date. I tried this one:

library(data.table)
library(tidyverse)
library(reshape2)
library(zoo)
date_list_first= dframe[,head(.SD,1), by = .(id)]
Nathalie
  • 1,228
  • 7
  • 20
  • `dframe %>% group_by(id, name) %>% slice(1L) ` using `dplyr` – Ronak Shah Sep 27 '19 at 09:15
  • 1
    Does the column `name` only has the names from the list or are there others not included in your `list`. Note..don't name your objects with predefined functions. `list` is a base R function – Sotos Sep 27 '19 at 09:22
  • 1
    Maybe something like `dframe[(!duplicated(dframe[c("id","name")])) & dframe$name %in% list,]` if you need to remove dups and only keep those in the specified list. – thelatemail Sep 27 '19 at 09:25
  • @Sotos name column includes the same names as the list – Nathalie Sep 27 '19 at 13:55

1 Answers1

2

Here's how using data.table:

library(data.table)
setDT(dframe)
date_list_first = dframe[order(date)][!duplicated(id,by=c('name','id'))]
webb
  • 4,180
  • 1
  • 17
  • 26
  • Thank you but it is not as the expected output. I want to keep all from list from their first date – Nathalie Sep 27 '19 at 09:42
  • Ah, now I understand: for each combination of id and name, you want to keep the row with the earliest date. Fixed. – webb Oct 09 '19 at 11:46