37

I am trying to remove some columns in a dataframe. I want to know why it worked for a single column but not with multible columns e.g. this works

album2[,5]<- NULL

this doesn't work:

album2[,c(5:7)]<- NULL
Error in `[<-.data.frame`(`*tmp*`, , 5:7, value = NULL) : 
replacement has 0 items, need 600

This also doesn't work:

for (i in 5: (length(album2)-1)){
 album2[,i]<- NULL
}
Error in `[<-.data.frame`(`*tmp*`, , i, value = NULL) : 
new columns would leave holes after existing columns
Anoushiravan R
  • 21,622
  • 3
  • 18
  • 41
Ahmed Elmahy
  • 479
  • 1
  • 4
  • 6
  • 7
    Try `album2[,5:7]<- list(NULL)` – talat Jan 05 '16 at 17:37
  • It would be great if you could supply a minimal reproducible example to go along with your question. Something we can work from and use to show you how it might be possible to answer your question. That way others can also befit form your question, and the accompanying answer, in the future. You can have a look at [this SO post](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) on how to make a great reproducible example in R. – Eric Fail Jan 05 '16 at 17:42
  • 1
    @EricFail especially as, as far as I can tell, the first example "e.g. this works" doesn't actually work. – doctorG Jan 05 '16 at 17:47
  • @doctorG using "list(NULL)" made it work with multiple columns , using NULL with a single column worked.i will take care of reproducibility in the future . – Ahmed Elmahy Jan 05 '16 at 17:57
  • See [my question here](http://stackoverflow.com/questions/19434778/behavior-of-null-on-lists-versus-data-frames-for-removing-data). – A5C1D2H2I1M1N2O1R2T1 Jan 06 '16 at 02:29
  • @docendodiscimus can you post your comment as an answer please. Your trick with `list(NULL)` works with named columns as well as column indices. The sole posted solution only works with column indices. – Alex Oct 04 '17 at 05:00
  • Also, as in R 3.4.4, `mtc <- mtcars` class(mtc) mtc mtc[, 3] <- NULL mtc mtc[, 4:6] <- NULL mtc ` will show that deleting columns this way does work. – doctorG Apr 24 '18 at 08:41

8 Answers8

64

Basic subsetting:

album2 <- album2[, -5] #delete column 5
album2 <- album2[, -c(5:7)] # delete columns 5 through 7
doctorG
  • 1,681
  • 1
  • 11
  • 27
  • 7
    drop columns by their position is not recommended, at least for me. – Jia Gao Apr 21 '18 at 08:24
  • Yes, and no. The OP was posed in context of specifying column positions. If you know the desired positions, then this is fine. For others to know if your comment is useful/relevant to them, could you add why you would not recommend it? – doctorG Apr 22 '18 at 15:40
  • well, what if one adds a new column to his/her data then column position changed? I agree that your answer is correct, but it's neither safe nor efficient. – Jia Gao Apr 23 '18 at 01:10
  • It's implicit you're at the point of knowing what column numbers you want. Getting to that point is up to you. Considering whether you're doing it interactively or programmatically (and thus what conditions you need to cope with) is also up to you. – doctorG Apr 24 '18 at 08:40
42

Adding answer as this was the top hit when searching for "drop multiple columns in r":

The general version of the single column removal, e.g df$column1 <- NULL, is to use list(NULL):

df[ ,c('column1', 'column2')] <- list(NULL)

This also works for position index as well:

df[ ,c(1,2)] <- list(NULL)

This is a more general drop and as some comments have mentioned, removing by indices isn't recommended. Plus the familiar negative subset (used in other answers) doesn't work for columns given as strings:

> iris[ ,-c("Species")]
Error in -"Species" : invalid argument to unary operator
Andrew Haynes
  • 2,612
  • 2
  • 20
  • 35
12

This works for me.

x <-dplyr::select(dataset_df, -c('column1', 'column2'))
marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Dulakshi Soysa
  • 342
  • 3
  • 7
9

If you only want to remove columns 5 and 7 but not 6 try:

album2 <- album2[,-c(5,7)] #deletes columns 5 and 7
Yoh Deadfall
  • 2,711
  • 7
  • 28
  • 32
Kara
  • 111
  • 1
  • 4
7

@Ahmed Elmahy following approach should help you out, when you have got a vector of column names you want to remove from your dataframe:

test_df <- data.frame(col1 = c("a", "b", "c", "d", "e"), col2 = seq(1, 5), col3 = rep(3, 5))
rm_col <- c("col2")
test_df[, !(colnames(test_df) %in% rm_col), drop = FALSE]

All the best, ExploreR

ExploreR
  • 313
  • 4
  • 15
2

Another solution, similar to @Dulakshi Soysa, is to use column names and then assign a range.

For example, if our data frame df(), has column names defined as column_1, column_2, column_3 up to column_15. We are interested in deleting the columns from the 5th to the 10th.

We can specify a range using column names e.g.,

library(dplyr)
x = select(df, -c('column_5':'column_10'))

Specifying the range can save some time when you are deleting multiple adjacent columns. It can also be used if you want to use some adjacent and some non-adjacent columns. For example, if you want to remove the 1st column in addition to the previously specified columns, you would update the code as below:

library(dplyr)
x = select(df, -c('column_1', 'column_5':'column_10'))
Sandy
  • 1,100
  • 10
  • 18
1

The following line will remove col_1 and col_2 from the data frame 'data'

data[!(colnames(data) %in% c('col_1','col_2'))]
Anoushiravan R
  • 21,622
  • 3
  • 18
  • 41
Jacob
  • 11
  • 1
1

Here is an interesting solution I read the other day in @JoachimSchork's blog, Statistics Globe. You can remove columns by column name. You can find out more here.

library(data.table)

mtcars2 <- mtcars

setDT(mtcars2)[, c("mpg", "cyl", "disp", "hp") := NULL]

> head(mtcars2)
   drat    wt  qsec vs am gear carb
1: 3.90 2.620 16.46  0  1    4    4
2: 3.90 2.875 17.02  0  1    4    4
3: 3.85 2.320 18.61  1  1    4    1
4: 3.08 3.215 19.44  1  0    3    1
5: 3.15 3.440 17.02  0  0    3    2
6: 2.76 3.460 20.22  1  0    3    1

Anoushiravan R
  • 21,622
  • 3
  • 18
  • 41