1

I'd like a new column in a data.frame to indicate whether, for each row, the number "2" appears in certain other columns. Here's a simple version that works for a small data.frame:

df <- data.frame(mycol.1 = 1:5,  mycol.2= 5:1, other.col = -2:2)
df$mycols.contain.two <- df$mycol.1 ==2 | df$mycol.2 ==2
df

  mycol.1 mycol.2 other.col mycols.contain.two
1       1       5        -2              FALSE
2       2       4        -1               TRUE
3       3       3         0              FALSE
4       4       2         1               TRUE
5       5       1         2              FALSE

Now suppose the data.frame has 50 columns, and I want the new column to indicate whether any of the columns beginning with "mycol" contain a "2" in each row, without having to use the "|" symbol 49 times. I assume there's an elegant dplyr answer using starts_with(), but I can't figure out the syntax.

conflictcoder
  • 383
  • 1
  • 9
  • 1
    ```df$new_col = rowSums(df[, 1:2] == 2) > 0``` you can use ```grep()``` to select only certain columns using a name search. – Cole Aug 24 '19 at 21:47
  • 1
    It will be easier to help / find relevant posts already on SO if you add a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – camille Aug 24 '19 at 22:02

3 Answers3

2

You could do:

df <- data.frame(mycol.1 = 1:5,  mycol.2= 5:1, other.col = -2:2)
df$TYPE <- ifelse(rowSums(ifelse(sapply(df, function (x){x == 2}), 1, 0)) > 0 , "TRUE", "FALSE")

# > df
# mycol.1 mycol.2 other.col  TYPE
# 1       1       5        -2 FALSE
# 2       2       4        -1  TRUE
# 3       3       3         0 FALSE
# 4       4       2         1  TRUE
# 5       5       1         2  TRUE
MAPK
  • 5,635
  • 4
  • 37
  • 88
1

You can achieve it by indexing. Let's take the mtcars data.

head(mtcars)

                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

After that, we can index any column. Say we are interesting in columns 8 to 11,

 mtcars$new <- rowSums(mtcars[,8:11]==2)>0

gives,

                   mpg cyl disp  hp drat    wt  qsec vs am gear carb   new
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4 FALSE
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4 FALSE
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1 FALSE
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1 FALSE
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2 TRUE
Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1 FALSE
> 
maydin
  • 3,715
  • 3
  • 10
  • 27
0

You could use a simple apply() loop:

df <- data.frame(mycol.1 = 1:5,  mycol.2= 5:1, other.col = -2:2)
df$mycols.contain.two <- apply(df, 1, function(x){any(x == 2)})

or if you want to check only the first 3 columns:

df <- data.frame(mycol.1 = 1:5,  mycol.2= 5:1, other.col = -2:2)
df$mycols.contain.two <- apply(df, 1, function(x){any(x[1:3] == 2)})
Sbamo
  • 51
  • 4