2

In R, how can I check per row if any value in a list (eg. 2, 3 or 4) is present in any one of three columns and then change that row in a fourth column?

Say I have a df:

 A B C D
 1 1 1
 2 1 1
 3 1 1

I would like to write (without a for-loop) if row n (column A or B or C) == 2 or 3 or 4 then D[1,]=1, else = 0

Basically check row wise if my numbers are present in any of three specific columns and, if so, update a forth column with 1, if not a 0.

Thanks,

Sotos
  • 51,121
  • 6
  • 32
  • 66

6 Answers6

2

Here is how you can use dplyr:

library(dplyr)
test <- data.frame(A = c(1, 2, 3), 
                   B = c(1, 1, 1), 
                   C = c(1, 1, 1))

testColumns <- c(2, 3, 4)                         # Values you want to flag 

Now that we have our data frame and a vector with the values we want to flag in a new column, let's use rowwise() to tell R to look at each row of the data frame, and then a combination of mutate() to create a new column, D, based off of various cases.
We specify the test cases and then their desired values using case_when().

Here's how we do it:

test <- test %>% 
  rowwise() %>%                                   # Look at test on a 'by row' basis'
  mutate(D = case_when(A %in% testColumns ~ 1,    # use mutate to create a new column D
                       B %in% testColumns ~ 1,
                       C %in% testColumns ~ 1, 
                       TRUE               ~ 0))

This gives us the following table:

print(test)
## A tibble: 3 x 4
#      A     B     C     D
#  <dbl> <dbl> <dbl> <dbl>
#1     1     1     1     0
#2     2     1     1     1
#3     3     1     1     1

Here are some helpful links for a few of the functions we used:
mutate()
rowwise()
case_when()

OTStats
  • 1,820
  • 1
  • 13
  • 22
  • One advantage of using `data.table` is that you can use `.SD` instead of specifying every column name, so that it is equally easy to perform this on a table with a large number of columns. Of course, this may be possible with `dplyr`, I am not very familiar :) – Walker in the City Dec 11 '18 at 22:54
  • That is a good question, I'm not sure how that would work. Looking into it! – OTStats Dec 12 '18 at 00:39
  • Thank you for your help. Do I have to create a new data frame or could I just add column D to "test"? I will run multiple tests, creating 8 new columns which I would like to append to my data frame. – William Brunzell Dec 12 '18 at 08:26
  • In that case, you can utilize `%<>%`, this is called the compound assignment pipe operator. This works just like the regular pipe operator `%>%`, but it will assign the result back to the original object. See this link for help: [Pipes in R Tutorial For Beginners](https://www.datacamp.com/community/tutorials/pipe-r-tutorial) – OTStats Dec 12 '18 at 14:47
  • I updated my response, using the forward pipe as I indicated above. – OTStats Dec 12 '18 at 14:54
  • @WilliamBrunzell Using `data.table` would allow you to add the new column by reference, which is especially useful if you are adding new columns repeatedly (if not done by reference this is memory inefficient as a new data frame is reallocated for every assignment). However, do you want to add the new columns immediately? If you performed this test iteratively, every new column that stores the results of the check would also be checking the previous results columns. It may make sense to store the 8 columns and then append them at the end. – Walker in the City Dec 12 '18 at 17:45
  • Spelling out each column name seems very impractical for larger datasets. – Tea Tree Mar 24 '21 at 19:34
  • @TeaTree Indeed. But the question specified that the there were 3 specific columns. While this might not be a scalable answer for `n` number of columns to which to look up (compared to other solutions that use `apply`, `map`, etc), my solution does achieve the desired result. There is value various types of solutions. – OTStats Mar 25 '21 at 14:19
2

You could use apply :

vec <- 2:4
df1$D <- apply(df1,1, function(x) any(vec %in% x)) +0
#   A B C D
# 1 1 1 1 0
# 2 2 1 1 1
# 3 3 1 1 1

Or a tidyverse version, possibly more efficient as apply involve some matrix conversions :

library(tidyverse)
df1 %>% mutate(D = pmap_int(.,~any(vec %in% .)))
#   A B C D
# 1 1 1 1 0
# 2 2 1 1 1
# 3 3 1 1 1

data

df1 <- data.frame(A = c(1, 2,3), 
                   B = c(1, 1, 1), 
                   C = c(1, 1, 1))
moodymudskipper
  • 46,417
  • 11
  • 121
  • 167
1

With only these three condition you could do

df1$D <- as.integer(rowSums(df1 == 2 | df1 == 3 | df1 == 4) >= 1) # or maybe df1 >=2 & df1 <= 4
df1
#  A B C D
#1 1 1 1 0
#2 2 1 1 1
#3 3 1 1 1

A more general approach could be

vec <- 2:4
df1$D <- Reduce(`+`, lapply(vec, \(x) rowSums(df1 == x) > 0))

data

df1 <- structure(list(A = 1:3, B = c(1L, 1L, 1L), C = c(1L, 1L, 1L)), .Names = c("A", 
"B", "C"), class = "data.frame", row.names = c(NA, -3L))
markus
  • 25,843
  • 5
  • 39
  • 58
0

Here is a way to do it using data.table:

library(data.table)
test <- data.table(A = c(1, 2,3), 
                   B = c(1, 1, 1), 
                   C = c(1, 1, 1))
checkValues <- c(2, 3, 4)

test[, c("D"):= Reduce(`|`, lapply(.SD, function(x){x %in% checkValues}))]

test
   A B C     D
1: 1 1 1 FALSE
2: 2 1 1  TRUE
3: 3 1 1  TRUE

It would be easy to replace FALSE=0, TRUE=1 (replace Reduce(|, lapply(.SD, function(x){x %in% c(2, 3, 4)})) with as.numeric(Reduce(|, lapply(.SD, function(x){x %in% c(2, 3, 4)}))), but it seems that you are using D to hold a logical value so it would make sense to me to have it as a logical vector.

This also updates test to have column D by reference, which is more efficient.

Two answers to perhaps look at as well would be: Finding rows containing a value (or values) in any column and Add multiple columns to R data.table in one function call?

Walker in the City
  • 527
  • 1
  • 9
  • 22
0

One way to do it in tidyverse:

df %>%
 rowid_to_column() %>% #Creating an unique row ID
 gather(var, val, -rowid) %>% #Transforming the data from wide to long
 group_by(rowid) %>% #Grouping
 mutate(D = ifelse(any(val %in% c(2, 3, 4)), 1, 0)) %>% #Testing whether any value from a given row is in the specified list 
 spread(var, val) %>% #Returning the data to wide format
 ungroup() %>%
 select(-rowid) #Deleting the redundant variable

      D     A     B     C
  <dbl> <int> <int> <int>
1    0.     1     1     1
2    1.     2     1     1
3    1.     3     1     1
tmfmnk
  • 38,881
  • 4
  • 47
  • 67
  • 1
    This is a very convoluted approach to a simple problem. Using `rowwise()` and mutate would be a much simpler approach. – OTStats Dec 11 '18 at 22:06
  • @OTStats your approach is definitely more elegant and straightforward than mine. However, I think it is good to see different approaches to the same problem. – tmfmnk Dec 11 '18 at 22:09
  • I definitely agree - helps with many different types of problems. – OTStats Dec 11 '18 at 22:14
0

Parameterized for column names and numbers of interest.

library(tidyverse)

data <-
  data.frame(
    A = c(1, 2, 3), 
    B = c(1, 1, 1), 
    C = c(1, 1, 1)
  )

nums <- c(2, 3, 4)
cols <- c('A', 'B', 'C')

data$D <-
  data[, cols] %>%
  map(~.x %in% nums) %>%
  reduce(`|`)
Eric
  • 3,403
  • 1
  • 19
  • 18