1

Here are two dfs of example data:

df1

ID      First.seen  Last.seen 
A10   2015-09-07  2015-09-16       
A11   2015-09-07  2015-09-19 



df2
 ID      First_seen  Last_seen
 A1      2015-09-07  0
A10      2015-09-07  0

I want to fill df2$Last_seen if the ID is common in both dfs. Note that in the real data I have several IDs in both dfs. I've tried with for loop but I just get numerical values:

for (i in 1:nrow(df2)){
  if (df2$ID[i] %in% df1$ID) {
    df2$Last_seen[i] <- df1$Last.seen[df1$ID == df2$ID[i]]
  }else{
    df2$Last_seen[i] <- 0
  }
}

I found this answer to the same question that uses seq_along but I get a result of df1$Last_seen[i] == 1 when I apply this code:

 for (i in seq_along(1:nrow(df2))){
      if (df2$ID[i] %in% df1$ID) {
        df2$Last_seen[i] <- df1$Last.seen[df1$ID == df2$ID[i]]
      }else{
        df2$Last_seen[i] <- 0
      }
    }

Any suggestions on how to use it properly?

Community
  • 1
  • 1
Bonono
  • 827
  • 1
  • 9
  • 18

1 Answers1

0

You don't need a loop to do that. You need to join the tables on ID. This can be done with dplyr:

df1 <- read.table(text="ID      First.seen  Last.seen
A10   2015-09-07  2015-09-16
A11   2015-09-07  2015-09-19",header=TRUE, stringsAsFactors=FALSE)

df2<- read.table(text="ID      First_seen  Last_seen
 A1      2015-09-07  0
A10      2015-09-07  0",header=TRUE, stringsAsFactors=FALSE)

library(dplyr)
left_join(df2,df1)
   ID First_seen Last_seen First.seen  Last.seen
1  A1 2015-09-07         0       <NA>       <NA>
2 A10 2015-09-07         0 2015-09-07 2015-09-16

If you want a three column table:

left_join(df2,df1, by=c("ID" = "ID","First_seen"="First.seen")) %>%
mutate(Last_seen=ifelse(is.na(Last.seen),Last_seen,Last.seen)) %>%
select(-Last.seen)

   ID First_seen  Last_seen
1  A1 2015-09-07          0
2 A10 2015-09-07 2015-09-16

EDIT To change occurences where Last_seen is 0, you can ad another ifelse:

left_join(df2,df1, by=c("ID" = "ID","First_seen"="First.seen")) %>%
mutate(Last_seen=ifelse(is.na(Last.seen),Last_seen,Last.seen),
       Last_seen=ifelse(Last_seen==0,format(as.Date(First_seen)+16,"%Y-%m-%d"),Last.seen))%>%
select(-Last.seen)

   ID First_seen  Last_seen
1  A1 2015-09-07 2015-09-23
2 A10 2015-09-07 2015-09-16

EDIT2

left_join(df2,df1, by=c("ID" = "ID","First_seen"="First.seen")) %>%
mutate(Last_seen=ifelse(is.na(Last.seen),Last_seen,Last.seen),
       Last_seen=ifelse(Last_seen==0,format(as.Date(First_seen)+16,"%Y-%m-%d",origin = "1900-01-01"),Last.seen))%>%
select(-Last.seen)

   ID First_seen  Last_seen
1  A1 2015-09-07 2015-09-23
2 A10 2015-09-07 2015-09-16
Pierre Lapointe
  • 16,017
  • 2
  • 43
  • 56
  • Brilliant, makes a lot of sense. What if I wanted to change the `0` value in the `Last_seen` column of the resulting `df` to the value of the `First_seen + 16 days` – Bonono Mar 27 '17 at 13:48
  • I just tested out the "three column table" solution using my full data and I get numerical values again instead of dates. Any ideas? – Bonono Mar 27 '17 at 14:15
  • @Bonono If the object you created with left_join... is df1, do that after everything: `df1$Last_seen <- as.Date(df1$Last_seen,format("%Y-%m-%d"))` – Pierre Lapointe Mar 27 '17 at 14:19
  • I had that problem last time - it just turns the numerical values in to `NA`s – Bonono Mar 27 '17 at 14:36
  • I think it has to do with the `classes` of the columns. Everything done with this example has been `character ` class whilst I'm using `POSIXct`. Would you know how to tackle that problemmor am I going to have to convert to `character` first? – Bonono Mar 27 '17 at 14:39
  • 1
    Yes, that's what's happening. See my second edit. I just keep everything as character. You can change it to date later if you want . – Pierre Lapointe Mar 27 '17 at 14:46