1

I'm trying to create function that, for each row, identifies the name of the column in which a specified string occurs.

For example, in this dataset imagine I start with rows 1-3. I want to create a new column desired_column that contains the rows where "foo" occurs.

row1 <- c('a', 'b', 'c', 'foo')
row2 <- c('foo', 'a', 'foo', 'b')
row3<- c('b', 'foo', 'b', 'b')
desired_column <- c('row2', 'row3', 'row2', 'row1')
df <- data.frame(row1, row2, row3, desired_column)

     row1 row2   row3        desired_column
1    a    foo    b           row2
2    b    a      foo         row3
3    c    foo    b           row2
4    foo  b      b           row1

I've tried messing around with functions like which(df == "foo", arr.ind = TRUE) (see here), but that doesn't seem to work, and with iterations of stringr::str_detect() (e.g., here). I've also tried dplyr:contains() like here but can't figure out how to get it to iterate over rows. Help?

Daniel Yudkin
  • 494
  • 4
  • 11
  • Did you see this post? https://stackoverflow.com/questions/32217562/in-r-find-the-column-that-contains-a-string-in-for-each-row – Ben May 07 '20 at 15:03

1 Answers1

4

You can do:

df$desired_column <- names(df)[max.col(cbind(df, FALSE) == "foo", ties.method = "last")]

Which gives:

  row1 row2 row3 desired_column
1    a  foo    b           row2
2    b    a  foo           row3
3    c  foo    b           row2
4  foo    b    b           row1

Using cbind(df, FALSE) ensures NA is returned if there is no match in any particular row.

In response to your comment, grepl() can be used for partial matching or agrepl() for fuzzy matching:

row1 <- c('a', 'b', 'c', 'fool')
row2 <- c('foo', 'a', 'foo', 'b')
row3<- c('b', 'foo', 'b', 'b')
df <- data.frame(row1, row2, row3)
df$desired <- max.col(cbind(sapply(df, grepl, pattern = "foo"), FALSE), ties.method = "last")

Gives:

  row1 row2 row3 desired
1    a  foo    b    row2
2    b    a  foo    row3
3    c  foo    b    row2
4 fool    b    b    row1
Ritchie Sacramento
  • 29,890
  • 4
  • 48
  • 56
  • Thank you for this. Could you possibly expand on this answer to include situations in which the cell isn't an exact match? For example, if one of the cell values was "fool", I would still like it to create a positive match. I tried using %in% but it doesn't yield the same result. – Daniel Yudkin May 07 '20 at 15:14