1

I am working on a project in R. I have two data frames with multiple entries for each employee ID in both the data frames. That is, example, employee ID 1 has multiple entries in Table 1 and table 2. Therefore, there is no Primary key in these tables.

I want to merge these two tables for better analysis. When I try to merge these tables, it counts the permutations of each ID and distorts the data in the resulting table.

Can anyone please suggest a way out.

HJain
  • 135
  • 2
  • 10
  • Your question is unclear, please read and edit your question according to: [How to make a great R reproducible example?](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – pogibas Aug 27 '18 at 14:00

2 Answers2

1

You can merge two tables with merge command.

by = "employeeid" enables you to specify key column. if you have more than one column by = c("emoloyeeid", "period")

table3 <- merge(table1, table2, by  = "employeeid")

?merge will give you more options.

Selcuk Akbas
  • 711
  • 1
  • 8
  • 20
0

I am working on a project in R. I have two data frames with multiple entries for each employee ID in both the data frames. That is, example, employee ID 1 has multiple entries in Table 1 and table 2. Therefore, there is no Primary key in these tables.

One idea is to wrangle your data so there are no more multiple entries.

Another is to summarize your data so there is only row per Employee in each table.

A third is to use the full-join to connect all matching ID

https://dplyr.tidyverse.org/reference/join.html

library(dplyr)
full_join(df1, df2, by = "EmployeeID")

Check out the DPLYR "Data Transformation Cheat Sheet" https://github.com/rstudio/cheatsheets/raw/master/data-transformation.pdf

M.Viking
  • 5,067
  • 4
  • 17
  • 33