0

Newbie using R, and would like to know how to create a new column in a data frame based on data from another data frame. Suppose I have 2 data frames, linked by letters "a", "b", and "c" in df1 col2 and df2 col1 as follows:

> col1<-c(1, 2, 3, 4)
> col2<-c("a","b","c","c")
> df1<-data.frame(col1, col2)
> df1
  col1 col2
1    1    a
2    2    b
3    3    c
4    4    c
> c1<-c("a","b","c")
> c2<-c("Jim","Sue","Bob")
> c3<-c("abc","def","ghi")
> df2<-data.frame(c1,c2,c3)
> df2
  c1  c2 c3
1  a Jim abc
2  b Sue def
3  c Bob ghi

I want to add a column3 to df1 to add user names based on the values of "a", "b", or "c" in df1. That is, how do I get the following using R?

> df1
  col1 col2 col3
1    1    a Jim
2    2    b Sue
3    3    c Bob
4    4    c Bob

I've tried df1["col3"]<-df2[df1$col2==df2$c1]$c2 but it's not working.

Note: I only want to add one column from df2 to df as shown above (e.g. not all columns in df2).

user46688
  • 733
  • 3
  • 11
  • 29
  • 2
    look at merge or match. I like `left_join` from `dplyr`. For completeness, it would just be `merge(df1, df2, by.x="col2", by.y="c1")` – Rorschach Nov 01 '15 at 19:55

2 Answers2

1

You need merge().

# this merges the two data frames and rebinds the result to df1 
df1 <- merge(df1, df2, by.x = "col2", by.y = "c1")
# This renames the "c1" column to "col3"
names(df1) <- c("col2", "col1", "col3")
# This rearranges the columns in the right order
df1 <- df1[,c(2,1,3)] 

The result is:

 col1 col2 col3
1    1    a  Jim
2    2    b  Sue
3    3    c  Bob
4    4    c  Bob

Type ?merge to see how it works.

kliron
  • 4,383
  • 4
  • 31
  • 47
  • Thanks @kliron, this apparently merges all columns in `df2`... how to merge only the specified column `c2` in `df2`? – user46688 Nov 01 '15 at 20:07
  • What do you mean? This is exactly the output you wanted in your question. If you have more columns that you don't need in your df2 (that you don't show in your question) just subset your df2 keeping only the columns you want and merge that data frame to df1 instead. – kliron Nov 01 '15 at 20:13
  • You can also subset the resulting df1 or even drop individual columns by setting them to NULL like this: df1$unwantedColumn <- NULL – kliron Nov 01 '15 at 20:14
  • I figured it out... `df1 <- merge(df1, df2[c("c1","c2")], by.x = "col2", by.y = "c1")` – user46688 Nov 01 '15 at 20:17
1
> merge(df1, df2, by.x = "col2", by.y = "c1")
  col2 col1  c2
1    a    1 Jim
2    b    2 Sue
3    c    3 Bob
4    c    4 Bob
Ken Benoit
  • 14,454
  • 27
  • 50
  • Only `c2` is added n this answer (which matches yours except the column name of `c2` is called `col3` in your question. You can change that with `names(result)[3] <- "col3"`. – Ken Benoit Nov 01 '15 at 20:16