It looks like both datasets contain a column called name
. When you merge by country
, both columns end up in the new dataset with ".x" and ".y" added for disambiguation.
If your observations are specific to a country-year combination, you will want to merge by country
and year
instead. That way you will not have duplicate year
columns in your final dataset. Of course you will need to clean the respective year
columns first so that they match each other (i.e., remove the "X").
Here is an example (not knowing exactly what your initial data looks like):
library(tidyverse)
# define data
euexit <- tribble(
~country, ~year, ~scorepol,
"Austria", 2019, 5.520714,
"Austria", 2018, 5.867006,
"Austria", 2014, 6.43598,
"Austria", 2017, 4.910919,
"Austria", 2015, 5.122485,
"Austria", 2016, 6.251086
)
idelong <- tribble(
~country, ~year, ~fdi,
"Austria", "X2014", 0.38685812,
"Austria", "X2015", -2.08805654
)
# clean year columns
euexit <- euexit |>
mutate(year = as.character(year))
idelong <- idelong |>
mutate(year = str_remove(year, "^X"))
# merge
idelong |>
left_join(euexit, join_by(country, year))
#> # A tibble: 2 × 4
#> country year fdi scorepol
#> <chr> <chr> <dbl> <dbl>
#> 1 Austria 2014 0.387 6.44
#> 2 Austria 2015 -2.09 5.12
Created on 2023-03-23 with reprex v2.0.2