0

I have two dataframes of different lengths, and I am trying to use the data of one df to complete the data from the second dataframe. My df1 is a list of projects and organizations participating in these projects, with various information, and my df2 is a list of all organizations involved in these projects and some additional informations that I want to add to df1 to have a more complete picture.

For instance, a shorter example of my inputs:

df1
project_id      org_funds        project_year     org_ID
1                     100              2016          11
1                     500              2016          13
2                     200              2020          11
2                     150              2020          12
2                     350              2020          13
df2
org_ID      org_loc
11              USA
12               NL
13               CN

I am trying to add the info from the column org_loc to df1, so that for every org_ID that matches in the two df, I can add the corresponding org_loc information.

My expected output would be:

df1
project_id      org_funds        project_year     org_ID     org_loc
1                     100              2016          11         USA
1                     500              2016          13          CN
2                     200              2020          11         USA
2                     150              2020          12          NL
2                     350              2020          13          CN

I cannot do it manually, as my df1 have thousands of rows and dozens of columns.

I tried several things, like a for loop, a combination of for loop and if function, and of mutate function and if function. Nothing worked so far because as I understood it, these all require that the two df have the same length which is not the case here.

For instance:

df1$org_loc <- 1 
if ((df1$org_ID[]) == (df2$org_ID[])){
    df1$org_loc[] <- df2$org_loc[]
}

which returned 2 errors : "the condition has length > 1" and "longer object length is not a multiple of shorter object length"

or another thing I tried:

df1 <- df1 %>%
    mutate(org_loc = case_when(df1$org_ID == df2$org_ID ~ df2$org_loc ))

which returns "error in mutate() Caused by error in case_when(): ! Can't recycle ..1 (left) (size 7628) to match ..1 (right) (size 3766)."

(7628 is the size of my df1, 3766 the size of my df2)

I've been searching for days and I didn't find any solution that allows me to do this with two objects of different lengths.

Could any of you have any solution to my problem? Thanks!

jchd
  • 1

0 Answers0