0

I am having some issues with my data. I have two datasets on football matches, that are covering the same games and have the same "Match_ID" and "Country_ID" and I would like to merge the datasets. However I am having some issues. 1. I cant seem to find a way of merging the data by more than one column? and 2. One of the datasets have a few more rows than the other one. I would like to remove the rows that contains a "Match_ID" which is not in both datasets. Any Tips?

drumtr
  • 1
  • 1
  • 1
    Sounds like you need `dplyr::inner_join()`. See https://www.rdocumentation.org/packages/dplyr/versions/0.7.8/topics/join . You don't give a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example), but something like `inner_join(football_1, football_2, by=c("Match_ID" = "Match_ID")` – pgcudahy Sep 09 '22 at 11:36

1 Answers1

0

Since you didnt provide sample data, I dont know what your data look like so only taking a stab. Here is some sample data:

# 5 matches
df1 <- data.frame(match_id = 1:5,
                  country_id = LETTERS[1:5],
                  outcome = c(0,1,0,0,1),
                  weather = c("rain", rep("dry", 4)))

# 10 matches (containing the same 5 in df1
df2 <- data.frame(match_id = 1:10,
                  country_id = LETTERS[1:10],
                  location = rep(c("home", "away"), 5))

You can simply use merge():

df3 <- merge(df1, df2, by = c("match_id", "country_id"))

# Note that in this case, merge(df1, df2, by = "match_id") will 
# result in the same output because of the simplicity of the sample data, but 
# the above is how you merge by more than one column

Output:

#   match_id country_id outcome weather location
# 1        1          A       0    rain     home
# 2        2          B       1     dry     away
# 3        3          C       0     dry     home
# 4        4          D       0     dry     away
# 5        5          E       1     dry     home
jpsmith
  • 11,023
  • 5
  • 15
  • 36