0

The goal is to mutate a new column to a df1, by matching a column in both df1 and df2, and returning the matching row value for another column (col5) in df2, [AND ONLY THE VALUE OF THAT COLUMN] which will be added to df1.

    df1 <- data.frame(c(1,1,1,2,2,2,3,3,3), c(9,9,9,8,8,8,7,7,7)), 
       c(6,6,6,5,5,5,4,4,4), c('A','A','A','B','B','B','C','C','C')
    colnames(df1) <- c('col1','col2','col3', 'col4')

    df2 <- data.frame(c(1,2,3), c(100,200,300), c('D','E','F'))
    colnames(df2) <- c('col1','col5', 'col6)

the end result should be

col1 col2 col3 col4 col5
1 9 6 A 100
1 9 6 A 100
1 9 6 A 100
2 8 5 B 200
2 8 5 B 200
2 8 5 B 200
3 7 4 C 300
3 7 4 C 300
3 7 4 C 300

I've tried things like

    df1 <- df1 %>%
       mutate(ifelse(col1==df2$col1, df$2col2, "")

But doesn't work. Suggestions hugely appreciated. left_join almost works, but I am trying to avoid merge all of the columns of the two dataframes. I only want to match col1 in both dataframes, and add col5 from df2 to df1.

ambergris
  • 17
  • 4
  • 2
    You need a join (i.e. `merge()`). In dplyr `df1 %>% left_join(df2, by = "col1")` – harre Aug 24 '22 at 14:01
  • I thought so but that doesn't go far enough. My real data has many more columns in both dataframes, so I need something more to specify which column I want pulled out. If i simply use left_join, it makes a huge mess out of my final dataframe. Is there a solution to select the specific column I want to join from df2? – ambergris Aug 24 '22 at 14:14
  • 3
    Base R `merge(df1, df2, by = "col1", all.x = TRUE)`. BTW, it helps when your code matches your output, but here the column names are `"c.1..1..1..2..2..2..3..3..3."` and `"c.9..9..9..8..8..8..7..7..7."`, not `"col1"` and `"col2"`. Said more plainly, please actually run the code you post in questions. – r2evans Aug 24 '22 at 14:14
  • 2
    Okay ambergris, I've reopened with question for a bit. Please [edit] your question and provide more (and more accurate) data that better demonstrates the problem you say you have. – r2evans Aug 24 '22 at 14:15
  • 1
    Subset your `df2` to only include (a) the column(s) to join on and (b) the column(s) you want added, e.g., `df1 %>% left_join(select(df2, col1, col2), by = "col1")`. – Gregor Thomas Aug 24 '22 at 14:25
  • So, based on your edits, `df1 %>% left_join(select(df2, col1, col5), by = "col1")`. Give the columns you want to the join. – Gregor Thomas Aug 24 '22 at 14:51

0 Answers0