Using R, create a new column in a dataframe based on the values in two existing columns

Question

I have a dataframe which includes two columns in which the data are categorical, with the values T1, T2, T3. I want to create a new column which considers the values in these two columns and returns an output: If col1 is T1 and col2 is T1 = T1 If col1 is T3 and col2 is T3 = T3 And any other variation such as T1 and T3, returns NA. Any thoughts on how to do this would be appreciated. Thanks!

score 1 · Answer 1 · answered Sep 30 '22 at 15:17

If I understand your question correctly, you want to add a third column to your existing dataframe of two columns AND the values of each row in the third column should be based on the values of the first and second columns.

I tried to create the dataframe you mentioned in your question below:

# Create a dataframe
df <- data.frame(col1 = c("T1", "T2", "T3", "T1", "T2", "T3"),
                 col2 = c("T3", "T2", "T1", "T1", "T2", "T3"))

This dataframe has enough combinations of values from col1 and col2 to represent your question.

My approach to your question is to use dplyr::mutate() to create a third column, combined with dplyr::case_when() which allows you to define conditions.

## First install the dplyr package
# install.packages("dplyr")
library(dplyr)

# Add the third column based on your conditions
df <- df %>%
  mutate(col3 = case_when(
    col1 == col2 ~ col1, 
    TRUE ~ NA_character_
  ))

The dataframe df will look like this:

You see that if col1 and col2 have the same value, col3 will have that value also. Otherwise, there will be NA.

I hope this helps.

Following up from this can I use mutate and case_when to return values in a third column based on more conditions than just equality? So for example I would like it so that when col1 is T1 and col2 is T3, col3 returns low_high, and when col1 is T3 and col2 is t1 it returns high_low (and the same for low_low and high_high)? — Gabriel Osborn, Oct 17 '22 at 13:21

score 0 · Answer 2 · answered Sep 30 '22 at 14:10

I suggest doing a left-join (another ref: What's the difference between INNER JOIN, LEFT JOIN, RIGHT JOIN and FULL JOIN?):

quux <- data.frame(col1 = c("T1", "T3"), col2 = c("T1", "T3"), newval = c("T1", "T3"))
merge(yourdata, quux, c("col1", "col2"), all.x = TRUE)

This method supports "arbitrary" combinations of col1 and col2 with specific results. It uses equality only, where anything that is not explicitly matched by both col1 and col2 will be assigned NA.

Example:

set.seed(42)
yourdata <- data.frame(col1 = sample(c("T1","T2","T3"), size=10, replace=TRUE), col2=sample(c("T1","T2","T3"), size=10, replace=TRUE), col3=sample(letters, size=10, replace=TRUE))
yourdata
#    col1 col2 col3
# 1    T1   T1    d
# 2    T1   T1    e
# 3    T1   T2    m
# 4    T1   T2    e
# 5    T2   T2    t
# 6    T2   T3    b
# 7    T2   T3    h
# 8    T1   T1    c
# 9    T3   T1    z
# 10   T3   T3    a

merge(yourdata, quux, c("col1", "col2"), all.x = TRUE)
#    col1 col2 col3 newval
# 1    T1   T1    e     T1
# 2    T1   T1    c     T1
# 3    T1   T1    d     T1
# 4    T1   T2    m   <NA>
# 5    T1   T2    e   <NA>
# 6    T2   T2    t   <NA>
# 7    T2   T3    b   <NA>
# 8    T2   T3    h   <NA>
# 9    T3   T1    z   <NA>
# 10   T3   T3    a     T3

Using R, create a new column in a dataframe based on the values in two existing columns

2 Answers2