1

I have a data frame, trainSmall, with six columns.

> trainSmall
     chr      pos      end LCR gc.50  type
  1:  22 39491638 39491639   0     0 del_L
  2:  22 29434028 29434029   0     0   ins
  3:  22 28347247 28347248   0     0 del_R
  4:  22 40121931 40121932   0     0   ins
  5:  22 39122351 39122352   0     0 del_L
 ---                                      
768:  22 27869380 27869381   0     0 del_R
769:  22 28823159 28823160   0     0   ins
770:  22 24319557 24319558   0     0 del_R
771:  22 38570330 38570331   0     0 del_L
772:  22 48182139 48182140   0     0 del_L
> is.data.frame(trainSmall)
[1] TRUE

I also have a vector, excl, with four items.

> excl
[1] "chr"  "pos"  "end"  "type"

I would like to take all rows of trainSmall, but only the columns not in excl. So I tried

> trainSmall[, !colnames(trainSmall) %in% excl]
[1] FALSE FALSE FALSE  TRUE  TRUE FALSE

But this just gives me another logical vector, not the actual rows from the data frame.

Even doing

> trainSmall[, c(F,F,F,T,T,F)]
[1] FALSE FALSE FALSE  TRUE  TRUE FALSE

doesn't work as I expected.

I'm pretty confused because this seems to be the method advocated in many places (like this answer) for subsetting a data frame. What am I doing wrong?

Response to possible duplicate flag: None of the solutions there seem to work in this case.

> trainSmall[, -which(names(trainSmall) %in% excl)]
[1] -1 -2 -3 -6
> trainSmall[ , !names(trainSmall) %in% excl]
[1] FALSE FALSE FALSE  TRUE  TRUE FALSE
Randoms
  • 2,110
  • 2
  • 20
  • 31

2 Answers2

1

You could go for (note the parentheses):

df[, !(colnames(df) %in% excl)]

Another fun way would be to make an operator yourself (doing the opposite of %in%):

excl <- c("chr", "pos", "end", "type")

'%!in%' <- function(x,y)!('%in%'(x,y))
mask <- colnames(df) %!in% excl
df[,mask]

Both will yield

   LCR gc.50
1:   0     0
2:   0     0
3:   0     0
4:   0     0
5:   0     0
Jan
  • 42,290
  • 8
  • 54
  • 79
  • Interesting--for me, `trainSmall[, !(colnames(trainSmall) %in% excl)]` still leads to `[1] FALSE FALSE FALSE TRUE TRUE FALSE`. – Randoms Apr 29 '18 at 20:06
  • How can I check? `is.data.frame(trainSmall)` is still `TRUE`. – Randoms Apr 29 '18 at 20:09
  • Looks still like a dataframe to me? `> str(trainSmall) Classes ‘data.table’ and 'data.frame': 772 obs. of 6 variables: $ chr : int 22 22 22 22 22 22 22 22 22 22 ... $ pos : int 39491638 29434028 28347247 40121931 39122351 30102666 30619293 29877272 36923243 49868710 ... $ end : int 39491639 29434029 28347248 40121932 39122352 30102667 30619294 29877273 36923244 49868711 ... $ LCR : num 0 0 0 0 0 0 0 0 0 0 ... $ gc.50: num 0 0 0 0 0 0 0 0 0 0 ... $ type : chr "del_L" "ins" "del_R" "ins" ... - attr(*, ".internal.selfref")= ` – Randoms Apr 29 '18 at 20:10
  • @Randoms: It is a `data.table`, use the approach by @Yannis. – Jan Apr 29 '18 at 20:12
  • 1
    Defining the custom operator works, but I need to add two periods in front of `mask`: `trainSmall[, ..mask]` – Randoms Apr 29 '18 at 20:13
1

Given the output of your code, I think your data are in data.table format (data table have both data frame and data table as their class). So, this should work:

trainSmall[, !excl, with = FALSE]
Yannis Vassiliadis
  • 1,719
  • 8
  • 14