Two year columns when merging datasets

Question

I want to make a linear regression of two sets of variables, so I merged them into one table, but two "year" columns show up. How can I only have one?

enter image description here

I used this code to merge my idelong and euexit data:

> fichierfinal<-merge (idelong,euexit, by= 1)

and tried to solve my issue with this one but it didn't work :

> datamerge<- idelong %>% right_join(euexit,by=c(1,2))

I would be very grateful for your help, thank you!

Welcome to StackOverflow! Help us help you by making your example [reproducible (and minimal)](https://stackoverflow.com/questions/5963269) — Aurèle, Mar 23 '23 at 09:47

dufei · Accepted Answer · 2023-03-23T11:36:11.617

0

It looks like both datasets contain a column called name. When you merge by country, both columns end up in the new dataset with ".x" and ".y" added for disambiguation.

If your observations are specific to a country-year combination, you will want to merge by country and year instead. That way you will not have duplicate year columns in your final dataset. Of course you will need to clean the respective year columns first so that they match each other (i.e., remove the "X").

Here is an example (not knowing exactly what your initial data looks like):

library(tidyverse)

# define data
euexit <- tribble(
  ~country, ~year, ~scorepol,
  "Austria",  2019,  5.520714,
  "Austria",  2018,  5.867006,
  "Austria",  2014,   6.43598,
  "Austria",  2017,  4.910919,
  "Austria",  2015,  5.122485,
  "Austria",  2016,  6.251086
)

idelong <- tribble(
  ~country,   ~year,        ~fdi,
  "Austria", "X2014",  0.38685812,
  "Austria", "X2015", -2.08805654
)

# clean year columns
euexit <- euexit |> 
  mutate(year = as.character(year))

idelong <- idelong |> 
  mutate(year = str_remove(year, "^X"))

# merge
idelong |> 
  left_join(euexit, join_by(country, year))
#> # A tibble: 2 × 4
#>   country year     fdi scorepol
#>   <chr>   <chr>  <dbl>    <dbl>
#> 1 Austria 2014   0.387     6.44
#> 2 Austria 2015  -2.09      5.12

^{Created on 2023-03-23 with reprex v2.0.2}

edited Mar 23 '23 at 11:36

answered Mar 23 '23 at 10:39

dufei

2,166
1
7
18

Hi, thank you I see the problem! However I don't know how to remove the X, since in my excel file there is no X in front of the years, but on R there is. Is there a code to remove it? Thank you! – euroscepticism Mar 23 '23 at 10:42
Have a look at the sample code I added. – dufei Mar 23 '23 at 10:48
Thank you very much! I was able to remove the X and now I still have two columns that are called year.x and year.y When I run the code, to merge by country and year it says : `Error in `left_join()`: ! Can't join `x$year` with `y$year` due to incompatible types. ℹ `x$year` is a . ℹ `y$year` is a .` Should I change y$year into a character? If so how? I tried using `as.character(year.y)`but it didn't work. Thank you very much for your help – euroscepticism Mar 23 '23 at 11:31
Exactly as you describe - I included a line in my code above. The line you suggested doesn't work because you need to convert the column *before* you merge. At that point, the column is still called `year` rather than `year.y`. – dufei Mar 23 '23 at 11:36
Sorry, I hadn't seen it. But even when I run everything, I still have an error telling me that the columns from each set do not match together : idelong |> left_join(euexit, join_by(country, year)) Error in `left_join()`: ! Join columns in `x` must be present in the data. ✖ Problem with `year`. I checked, and the `year`column is present in my `euexit` dataframe, so I don't understand. Sorry for all the questions, I'm just really lost. Thank you for your patience! – euroscepticism Mar 23 '23 at 11:51
Is it present in `idelong` too? – dufei Mar 23 '23 at 11:59
yes. I think it finally worked, I have a correct table that appeared in my console. However I would like it to be a data on its own, that I can open in my Environment. How can I make it "become a data" that I can use? – euroscepticism Mar 23 '23 at 12:06
Right now I have this in my console : > idelong |> + left_join(euexit, join_by(country, name)) # A tibble: 162 × 4 country name `foreign direct investments` scorepol 1 Belgium 2014 -2.84 1.64 2 Belgium 2015 -4.22 1.80 3 Belgium 2016 12.1 1.54 4 Belgium 2017 -7.42 1.22 5 Belgium 2018 -7.66 2.78 6 Belgium 2019 -3.99 2.53 ...And it goes on. – euroscepticism Mar 23 '23 at 12:07
Just save it as a new object: `merged_data <- idelong |> left_join(euexit, join_by(country, year))` – dufei Mar 23 '23 at 12:44
Thank you very much it's all good! – euroscepticism Mar 23 '23 at 13:05
Great! Feel free to mark the answer as the accepted one if you'd like. – dufei Mar 23 '23 at 13:06

Two year columns when merging datasets

1 Answers1