2

I've got two dataframes (df1 and df2), an example of which looks like these:

df1 <- data.frame(StationID = c(1,1,1,2,2,3,3,3,3,3),
              Cameras       = c("Cam1","Cam2","Cam2","Cam1","Cam1","Cam2","Cam1","Cam2","Cam1","Cam1"),
              Start         = c("2013-04-23","2013-04-23","2013-04-23","2013-04-23","2013-04-23","2013-04-23","2013-04-23","2013-04-23","2013-04-23","2013-04-23"),
              End           = c("2013-04-25","2013-04-25","2013-04-25","2013-04-25","2013-04-25","2013-04-25","2013-04-25","2013-04-25","2013-04-25","2013-04-25"))


df2 <- data.frame(StationID = c(1,1,2,2,3,3),
                  Cameras   = c("Cam1","Cam2","Cam1","Cam2","Cam1","Cam2"))

I want to produce a new dataframe (df3), which looks for a match between two columns (StationID and Cameras) and then appends "Start" and "End" date columns to the corresponding matches. The code needs to dynamically add new columns based on the data, since some instances will have no matches, whilst others will have many matches.

Example output below:

  StationID Cameras     Start1       End1     Start2       End2     Start3       End3
1         1    Cam1 2013-04-23 2013-04-25       <NA>       <NA>       <NA>       <NA>
2         1    Cam2 2013-04-23 2013-04-25 2013-04-23 2013-04-25       <NA>       <NA>
3         2    Cam1 2013-04-23 2013-04-25 2013-04-23 2013-04-25       <NA>       <NA>
4         2    Cam2       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>
5         3    Cam1 2013-04-23 2013-04-25 2013-04-23 2013-04-25 2013-04-23 2013-04-25
6         3    Cam2 2013-04-23 2013-04-25 2013-04-23 2013-04-25       <NA>       <NA>

I'd appreciate any help on this task.

Thanks in advance!

Ross
  • 359
  • 2
  • 11

2 Answers2

2

We join the two datasets on 'StationID' and 'Cameras' and use dcast from data.table which can take multiple value.var columns to reshape to 'wide' format.

 library(data.table)#1.9.7+
 dcast(setDT(df1)[df2, on = c("StationID", "Cameras")], 
     StationID + Cameras ~rowid(StationID, Cameras), value.var = c("Start", "End"))
 # StationID Cameras    Start_1    Start_2    Start_3      End_1      End_2      End_3
 #1:         1    Cam1 2013-04-23         NA         NA 2013-04-25         NA         NA
 #2:         1    Cam2 2013-04-23 2013-04-23         NA 2013-04-25 2013-04-25         NA
 #3:         2    Cam1 2013-04-23 2013-04-23         NA 2013-04-25 2013-04-25         NA
 #4:         2    Cam2         NA         NA         NA         NA         NA         NA
 #5:         3    Cam1 2013-04-23 2013-04-23 2013-04-23 2013-04-25 2013-04-25 2013-04-25
 #6:         3    Cam2 2013-04-23 2013-04-23         NA 2013-04-25 2013-04-25         NA

NOTE: THe rowid is from the data.table_1.9.7. It can be installed from here. If we have version 1.9.6 or older, create the rowid by

 dN <- setDT(df1)[df2, on = c("StationID", "Cameras")
                     ][, rid := 1:.N, .(StationID, Cameras)]

and then do the dcast

dcast(dN, StationID + Cameras ~rid, value.var = c("Start", "End"))
akrun
  • 874,273
  • 37
  • 540
  • 662
  • thanks for this suggestion. It seems to be exactly what I'm looking for. When I run your code, however, I get the following error (Error in eval(expr, envir, enclos) : could not find function "rowid"). What am I doing wrong here? – Ross Jul 12 '16 at 07:08
  • 2
    I managed to get this working. I had the CRAN version of data.table, rather than the development version (v.1.9.7) which includes the rowid function. For those that need to access this version, simply go to https://github.com/Rdatatable/data.table/wiki/Installation. Thanks again for everyone's help! – Ross Jul 12 '16 at 07:28
0

Maybe this is helpful

library(dplyr)
library(tidyr)
full_join(df1,df2) %>% group_by(StationID,Cameras) %>% summarise_each(funs(toString)) %>% separate(col = Start,into = paste("Start",1:3,sep=""),sep=", ",extra="merge") %>% separate(col = End,into = paste("End",1:3,sep=""),sep=", ",extra="merge")
user2100721
  • 3,557
  • 2
  • 20
  • 29
  • Thanks for this suggestion. This seems to work to an extent; Station2 Camera2 is missing. I guess this is because of all the NA's, but is there a way of keeping all records in df3, including those that comprise all NA's? Thanks – Ross Jul 12 '16 at 07:22
  • @Ross : added that. Sorry I didn't notice it. – user2100721 Jul 12 '16 at 08:13