I am having some issues with my data. I have two datasets on football matches, that are covering the same games and have the same "Match_ID" and "Country_ID" and I would like to merge the datasets. However I am having some issues. 1. I cant seem to find a way of merging the data by more than one column? and 2. One of the datasets have a few more rows than the other one. I would like to remove the rows that contains a "Match_ID" which is not in both datasets. Any Tips?
Asked
Active
Viewed 450 times
0
-
1Sounds like you need `dplyr::inner_join()`. See https://www.rdocumentation.org/packages/dplyr/versions/0.7.8/topics/join . You don't give a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example), but something like `inner_join(football_1, football_2, by=c("Match_ID" = "Match_ID")` – pgcudahy Sep 09 '22 at 11:36
1 Answers
0
Since you didnt provide sample data, I dont know what your data look like so only taking a stab. Here is some sample data:
# 5 matches
df1 <- data.frame(match_id = 1:5,
country_id = LETTERS[1:5],
outcome = c(0,1,0,0,1),
weather = c("rain", rep("dry", 4)))
# 10 matches (containing the same 5 in df1
df2 <- data.frame(match_id = 1:10,
country_id = LETTERS[1:10],
location = rep(c("home", "away"), 5))
You can simply use merge()
:
df3 <- merge(df1, df2, by = c("match_id", "country_id"))
# Note that in this case, merge(df1, df2, by = "match_id") will
# result in the same output because of the simplicity of the sample data, but
# the above is how you merge by more than one column
Output:
# match_id country_id outcome weather location
# 1 1 A 0 rain home
# 2 2 B 1 dry away
# 3 3 C 0 dry home
# 4 4 D 0 dry away
# 5 5 E 1 dry home

jpsmith
- 11,023
- 5
- 15
- 36