0

I have two data frames with different number of rows (>5000) and columns (~10). How ever, they both contain two columns that interest me: "Name" and "Position". These are historical job positions in two different companies. My goal is to find the individuals that worked in company 1 (df1) before working in company 2 (df2) and their position in both companies.

In each dataframe, column Names has all unique values. Not all names in df1 appear in df2, and not in the same order.

In column Position, there are only two values in both df: manager and intern.

df1 <- data.frame("Name" = c("Adam","Emma","Liz"), 
                  "Position" = c("intern","manager","intern"), 
                  "X" = c(123,321,111))

df2 <- data.frame("Name" = c("Adam","Liz","Carl","Sarah"), 
                  "Position" = c("manager","intern","intern","manager"), 
                  "Y" = c(999,987,789,777))

I need help with a code that compares the "Name" column in both dataframes and returns as a result a dataframe with all the columns of df2 of those rows that find a matching value in Names. This would represent current workers in company 2 that once worked in company 1.

Nonetheless, I also need to know their previous job position in company 1, which is why I would need the resulting dataframe to have a new column with the Position of df1 for that Name.

Something like this:

result <- data.frame("Name" = c("Adam","Liz"), 
                     "Position" = c("manager","intern"), 
                     "Y" = c(999,987)
                     "Position_df1" = c("intern","intern"))

I searched in other asked questions with no success.

Thanks

ABoyco
  • 1
  • 1

1 Answers1

0

You can try this approach

library(tidyverse)

result <- df2 %>% 
  inner_join(df1, by = c("Name")) %>% 
  select(Name, Position = Position.x, Y, Position_df1 = Position.y)
#   Name Position   Y Position_df1
# 1 Adam  manager 999       intern
# 2  Liz   intern 987       intern
Tho Vu
  • 1,304
  • 2
  • 8
  • 20