I have two dataframes both with columns that have ID numbers that should match. However, the two dataframes do not have the same number of ID's. One has 118 IDs and one has 103 ID's. The dataframe that has 118 ID's has all 103 ID's though, but some extras. I am wanting to take two columns from 118 row dataframe and merge them with the IDs that match in the 103 row dataframe based on ID number. I dont know the easiest way to go about this problem.
-
2check out the help file for the function `merge()`, or if you are using the tidyverse look at `left_join()` – gfgm Feb 06 '19 at 16:12
-
Take a look at `?dplyr::left_join` – Douglas Mesquita Feb 06 '19 at 16:13
-
RStudio provides cheat sheets for several different categories, including "Data Transformation". Check out the link here: https://www.rstudio.com/resources/cheatsheets/ – Prevost Feb 06 '19 at 16:19
1 Answers
You should consider using merge
instruction. Assuming that you have 2 dataframes and they have the column ID
with the identifiers:
merge(dataframe1, dataframe2, by = "ID")
If they have identifiers but the name of the column is different (I used ID_dfX
) in each dataframe you should use:
merge(dataframe1, dataframe2, by.x = "ID_df1", , by.y = "ID_df2")
With this, you'll have a dataframe where the same ID is in both dataframes. The rest of information is deleted. This is an inner join
.
If you want to add the information of one dataframe to another you can use, assuming that you don't want to touch dataframe1 and you just want to add information of dataframe2 that share the ID (left join
):
merge(dataframe1, dataframe2, by.x = "ID_df1", , by.y = "ID_df2", all.x = TRUE)
Here you'll find some information about joins (merging):
(INNER) JOIN: Returns records that have matching values in both dataframes.
LEFT (OUTER) JOIN: Return all records from the left table, and the matched records from the right table.
RIGHT (OUTER) JOIN: Return all records from the right table, and the matched records from the left table.
FULL (OUTER) JOIN: Return all records when there is a match in either left or right table.