1

I'm trying to create a new column ("newcol") in a dataframe ("data"), whose values will be determined by the contents of up to two other columns in the dataframe ("B_stance" and "C_stance"). The values within B_stance are either "L", "R", "U" or "N". Within C_stance they are either "L" or "R".

Please excuse the semi-logical language, but I need R code which will achieve this for the contents of newcol:

if (data$B_stance = "L" AND data$C_stance = "L") then (data$newcol = "N")
if (data$B_stance = "L" AND data$C_stance = "R") then (data$newcol = "Y")
if (data$B_stance = "R" AND data$C_stance = "R") then (data$newcol = "N")
if (data$B_stance = "R" AND data$C_stance = "L") then (data$newcol = "Y")
if (data$B_stance = "U") then (data$newcol = "N")
if (data$B_stance = "N") then (data$newcol = "N")

I've tried to see if/how "ifelse" could achieve this, but cannot find an example of how to draw from multiple column values in determining the new value.

Andrew Lavers
  • 4,328
  • 1
  • 12
  • 19
S_Brown
  • 13
  • 6

3 Answers3

0

It may be easier to create a key/val dataset and then do a join

keydat <- data.frame(B_stance = c('L', 'L', 'R', 'R'),
                      C_stance = c('L', 'R', 'R', 'L'),
                       newcol = c('N', 'Y', 'N', 'Y'),
                stringsAsFactors = FALSE)
library(dplyr)
left_join(data, keydat) %>%
           mutate(newcol = replace(newcol, is.na(newcol), 'N'))
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Correct me if I'm wrong, but this just created a new dataframe with four rows for each of the first four if-then combinations I listed (and then joined that DF to the original). What I need is something that will **calculate** a new value – either "Y" or "N" – for each row, according to what its values are within B_stance and C_stance. (Or, for the last two if-then lines, just according to what the value is within B_stance). Each row in the original DF would then have a new "Y"/"N" value, all of which would be stored in a column. Sorry if this was unclear. – S_Brown Jul 16 '18 at 17:37
  • @S_Brown Here, we assumed that the 'newcol' is not existing in your 'data'. So, by creating a keyval dataset with a new col with the combinations specified in 'B_stance,', 'C_stance' key columns, it will do the join, check for the corresponding values of the key columns in the 'data' and create the 'newcol' in the. If the combinations are not there, it will be NA. Here, I assumed that you wanted to have all the other combiantions as 'N' – akrun Jul 16 '18 at 17:40
0

In base R the ifelse function is most useful for these conditions. The dplyr library includesa more robust if_else function and a case_when function. The ifelse returns the second argument if the first is true and returns the third argument if the first argument is false.

data <- read.table(text="
B_stance C_stance
L R
L L
U X
R L
R R
N X
X X
", header= TRUE)


data$newcol = ifelse(data$B_stance == "L" & data$C_stance == "L", "N",
                     ifelse(data$B_stance == "L" & data$C_stance == "R", "Y",
                            ifelse(data$B_stance == "R" & data$C_stance == "R", "N",
                                   ifelse(data$B_stance == "R" & data$C_stance == "L", "Y",
                                          ifelse(data$B_stance == "U", "N",
                                                 ifelse(data$B_stance == "N", "N",
                                                        NA))))))

data

# B_stance C_stance newcol
# 1        L        R      Y
# 2        L        L      N
# 3        U        X      N
# 4        R        L      Y
# 5        R        R      N
# 6        N        X      N
# 7        X        X   <NA>
Andrew Lavers
  • 4,328
  • 1
  • 12
  • 19
  • Thanks, this looks like it should work. However I'm getting an error message: Error in `$<-.data.frame`(`*tmp*`, newcol, value = logical(0)) : replacement has 0 rows, data has 2727. Any thoughts? – S_Brown Jul 16 '18 at 18:24
  • No worries – spotted and fixed the bug. Now working perfectly. Thanks again! – S_Brown Jul 16 '18 at 18:30
  • @S_Brown also note that you will get more answers if you include sample data and I suggest you review https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – Andrew Lavers Jul 16 '18 at 18:33
0

With dplyr you can use case_when. It's a little cleaner than nested if_elses if you have numerous conditions.

df <- data.frame(
  B_stance = c('L', 'L', 'R', 'R'),
  C_stance = c('L', 'R', 'R', 'L'),
  stringsAsFactors = FALSE
)

df %>% mutate(
  newcol = case_when(
    B_stance == 'U'                   ~ 'N',
    B_stance == 'N'                   ~ 'N',
    B_stance == 'L' & C_stance == 'L' ~ 'N',
    B_stance == 'L' & C_stance == 'R' ~ 'Y',
    B_stance == 'R' & C_stance == 'L' ~ 'Y',
    B_stance == 'R' & C_stance == 'R' ~ 'N',
    TRUE                              ~ B_stance
  )
)

#   B_stance C_stance newcol
# 1        L        L      N
# 2        L        R      Y
# 3        R        R      N
# 4        R        L      Y

Note that the conditioning within case_when is lazy; the first true statement is executed. The final TRUE ensures there's a fallback in case no statement is true.

Werner
  • 14,324
  • 7
  • 55
  • 77