I have two different data.frames with diff. sizes.
dim(df1) = 2942 obs. 6 var.
dim(df2)= 16533 obs. 2307 var.
I would like to merge df1
with df2
aiming for a df3 with 2942 observations
.
The following variables define observations in data frames: serial (group indetification number), id1 (person identifier from the group ranges from 1 to number of people in the group), Day (the week day when the record was made)
. The day variable is defined as: Mon.:1; Tue.:2; Wed.:3, Thur.:4, Fri.:5, Sat.:6, Sun.:7
In df2
there are 2 observations for the same serial
. I would like to have a df with the serials and id1s'
on Day
level. So basically I create a new variable index for df1
and df2
library(dplyr)
df1<-df1 %>%
mutate(index = group_indices_(df1, .dots=c("serial", "id1", "id2")))
df2<-df2 %>%
mutate(index = group_indices_(df2, .dots=c("serial", "id1")))
Please see the sample data.
I was using the above code to merge:
library(dplyr)
df3<-inner_join(df1,df2,by=c("index","Day"),suffix=c(".df1",".df2"))
..and I receive a df3 with 65 obs. and 2310 var.
and not 2942 obs and 2310 var.
Can somebody explain why I have this issue?
Sample date:
df1
structure(list(serial = c(12, 123, 123, 10, 10), id1 = c(1, 1,
2, 1, 2), Day = c(1, 3, 2, 4, 2)), class = "data.frame", row.names = c(NA,
-5L))
df2
structure(list(serial = c(12, 12, 123, 123, 123, 123, 10, 10,
10, 10, 10, 10), id1 = c(1, 1, 1, 1, 2, 2, 1, 1, 2, 2, 3, 3),
id2 = c(1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2), Day = c(1, 6,
3, 7, 2, 7, 4, 7, 2, 7, 4, 7), index = c(7L, 8L, 9L, 10L,
11L, 12L, 1L, 2L, 3L, 4L, 5L, 6L)), row.names = c(NA, -12L
), class = "data.frame")
Sample data outcome:
serial id1 id2 Day
12 1 1 1
123 1 1 3
123 2 1 2
10 1 1 4
10 2 1 2