Merge Dyad_Year with Country_Year data

Question

I have two data-frames, one dyad-year and the other country-year.

        Xccode1 ccode2 ccdistance            countryname_1       countryname_2 majorpower_1 
        majorpower_2   milex_1 milper_1
        1   1      2     20          0 United States of America              Canada            1            
         0 143981000     2050
        2   2      2     31        957 United States of America             Bahamas            1            
        0 143981000     2050
        3   3      2     40       1129 United States of America                Cuba            1            
        0 143981000     2050
        4   4      2     41       1437 United States of America               Haiti            1

Country-Year:

   ccode1  year Fac1_A Fac2_A Fac3_A
   <int> <int>  <dbl>  <dbl>  <dbl>
    1      2  1980 -0.661   4.66   15.5
    2      2  1981 -0.661   4.66   15.5
    3      2  1982 -0.661   5.11   15.5
    4      2  1983 -0.661   5.21   15.5
    5      2  1984 -0.661   5.66   15.5
    6      2  1985 -0.661   5.21   15.5
    7      2  1986 -0.661   5.21   15.5
    8      2  1987 -0.661   5.21   15.5
    9      2  1988 -0.661   5.21   15.5
   10      2  1989 -0.661   5.00   15.5

I'd like to merge this two data-frames so that each country in the dyad has a FacX value, however my attempts at doing this has either given me an error or lots of NA's. I first attempted to use a simple ifelse:

    Demo_Dyad$Fac1_A_NR <- ifelse(Demo_Dyad$ccode1 == Cntry_yr$ccode1 &
                            Demo_Dyad$year == Cntry_yr$year,
                          Cntry_yr$Fac1_A, NA)

However, that results in each country in the Dyad_Year only having the value once. So e.g. USA <--> Haiti 1981 might have value X, but USA <--> Cuba 1981 will be NA.

I then attempted to do it by grouping in dplyr:

     Demo_Dyad %>%
     group_by(ccode1, year) %>%
     mutate(Fac1_A_NR <- ifelse(ccode1 == Cntry_yr$ccode1 &
                            year == Cntry_yr$year, Cntry_yr$Fac1_A, NA))

But get the error: Error in `$<-.data.frame`(`*tmp*`, Fac1_A_NR, value = c(-0.660552389122193, : replacement has 4942 rows, data has 217149

If anyone can see what is wrong with my code I would greatly appreciate it.

Use `Fac1_A_NR =`, not `<-`. In general, you should not be using the `<-` operator inside of other functions like that; while it can be done (and can work well in ways that normal `=` does not), it typically is more complicated than necessary and does not do what you expect (as here). — r2evans, Feb 14 '20 at 09:27
Also, unless `Cntry_yr` is a single row, you can't use an equality comparison between two different-length vectors. Perhaps when you say you'd like to merge them, perhaps you should literally `merge` them? — r2evans, Feb 14 '20 at 09:29
Lastly, while I might have something that could work, your `ifelse` suggests that the merge should be on `ccode1` (which does not vary at all in `Cntry_yr`) and `year` (which is not present in `Demo_Dyad`). Also, your dyad data fourth row appears to be incomplete. — r2evans, Feb 14 '20 at 09:35
Does this answer your question? [How do I combine two data-frames based on two columns?](https://stackoverflow.com/questions/6709151/how-do-i-combine-two-data-frames-based-on-two-columns) — Ashish, Feb 14 '20 at 09:43
I noticed now that the top DF did not become what I expected, sorry about that. The two DF's are only a small part of the actual data, and both contain 198 different countries. The dyadic then with 198*198 combinations, and does contain year. Completely forgot about merge though, thank you, gonna attempt that one. — Eric Nilsen, Feb 14 '20 at 10:10

score 0 · Answer 1 · answered Feb 14 '20 at 10:00

If the whole task is to merge two dataframes based on a column or columns they have in common, then use merge. For example:

DATA:

set.seed(111)
df_a <- data.frame(
  Xccode = 1:10,
  v1a = rnorm(10),
  v2a = sample(LETTERS[1:5], 10, replace = T))

df_b <- data.frame(
  ccode = 1:10,
  v1b = rnorm(10, 5),
  v2b = sample(LETTERS[4:7], 10, replace = T))

SOLUTION:

Assuming that the column the two dataframes have in common is Xccodeand, respectively, ccodeyou can use merge and specify the two columns as the ones to merge by:

df_ab <- merge(df_a, df_b, by.x =  "Xccode", by.y = "ccode")
df_ab
   Xccode        v1a v2a      v1b v2b
1       1  0.2352207   B 3.806391   E
2       2 -0.3307359   A 5.364187   E
3       3 -0.3116238   C 5.361662   E
4       4 -2.3023457   A 5.346964   G
5       5 -0.1708760   C 5.189737   D
6       6  0.1402782   E 4.840423   D
7       7 -1.4974267   A 5.326549   F
8       8 -1.0101884   A 5.598254   D
9       9 -0.9484756   A 3.158466   F
10     10 -0.4939622   C 7.718056   G

I guess here in this problem he is trying to match two fields in a data frame. So you should pass two column names in the by.x and by.y parameter. — Ashish, Feb 14 '20 at 10:07

Merge Dyad_Year with Country_Year data

1 Answers1