Attempting to drop numerous columns in R

Question

I have a list of column names in an excel file (sds.drop.csv) that I want to drop from a dataframe already imported into R.

I attempted to read the column names into R as follows

sds.drop <- as.list(read.csv("sds.drop.csv", header = F))

With the intention of then running the code below to drop them from the dataframe called 'dat'

dat1 <-  dat[, !(names(dat) %in% sds.drop)]

However, no columns are dropped. I'm guessing the issue resides in the way I am reading in the data. I attempted to read the data in without the as.list() command and it still did not work. Any thoughts?

Here is what the sds.drop.csv df looks like...

> head(read.csv("sds.drop.csv", header = F))
   V1
1 Q20
2 Q23
3 Q24
4 Q25
5 Q26
6 Q27

Did you mean to do `sds.drop <- unlist(read.csv("sds.drop.csv", header = F)); dat1 <- dat[,!names(dat) %in% sds.drop]`? — Roland, Apr 26 '15 at 16:53
@Roland That still doesn't work, and my df is a factor. I think the solution is to put " " on both sides of each value in the sds.drop df. Do you know how to add those to each variable? paste( ) ? — b222, Apr 26 '15 at 16:57
did you mean `dat1 <- dat[,!names(dat) %in% sds.drop]`? (I suspect `!names(sds.drop) %in% sds.drop` is not the selection you want). Also, hard to tell if anything is wrong with the format of sds.drop without knowing anything about it. can you add `head(read.csv("sds.drop.csv", header = F))` to your post?` — Jthorpe, Apr 26 '15 at 17:02
@Jthorpe correct, good catch. I fixed that but it still didn't drop the columns. I added the output from `head(read.csv("sds.drop.csv", header = F))`... it is a factor with 191 levels. — b222, Apr 26 '15 at 17:12

score 1 · Accepted Answer · answered Apr 26 '15 at 17:22

1

I think your main problem is that you're not dereferencing (i.e. extracting) the column of column names from the data.frame that is returned by read.csv(). It also makes sense to coerce to character, since that's what you need to operate on. Thus, you should be assigning sds.drop from as.character(read.csv('sds.drop.csv',header=F)[,1]);:

dat <- data.frame(Q20=1:3, Q23=4:6, Q24=7:9, Q25=10:12, Q26=13:15, Q27=16:18, Q30=19:21, Q31=22:24 );
dat;
##   Q20 Q23 Q24 Q25 Q26 Q27 Q30 Q31
## 1   1   4   7  10  13  16  19  22
## 2   2   5   8  11  14  17  20  23
## 3   3   6   9  12  15  18  21  24
sds.drop <- as.character(read.csv('sds.drop.csv',header=F)[,1]);
sds.drop;
## [1] "Q20" "Q23" "Q24" "Q25" "Q26" "Q27"
dat[,!names(dat)%in%sds.drop];
##   Q30 Q31
## 1  19  22
## 2  20  23
## 3  21  24

answered Apr 26 '15 at 17:22

bgoldst

34,190
6
38
64

This worked. Thank you, @bgoldst. I tried the as.character( ) command before, but the letters in my dataframe dropped. I think this was remedied by the `[,1]`... can you please explain what that means? – b222 Apr 26 '15 at 17:44
1

Sure. `[,1]` is a form of indexing which basically says "all rows, column 1". The `read.csv()` function returns a data.frame (which is a table of data) with one row per data line and one column per delimited field from the CSV file, so prior to indexing, you didn't actually have a list of strings, you had a table of data. Running `as.character()` on a data.frame doesn't really make sense, in fact I can't explain the output I'm getting (`"1:6"`). But by indexing column 1, you get a vector of the column data, which can be coerced to a character vector (from a factor). – bgoldst Apr 26 '15 at 17:49
1

Actually, instead of calling `as.character()`, you could use the `stringsAsFactors` argument of `read.table()` by setting it to false. See https://stat.ethz.ch/R-manual/R-devel/library/utils/html/read.table.html. – bgoldst Apr 26 '15 at 17:51

score 1 · Answer 2 · answered Apr 26 '15 at 17:54

1

    #keep only the headers from the csv
    cols = colnames(read.csv('C:/myfile.csv',colClasses='character',nrows =    1,header=TRUE)[-1, ])

    #subset dataframe excluding colnames in cols
    df2 <- yourdataframe[!names(yourdataframe) %in% cols]

answered Apr 26 '15 at 17:54

Amrita Sawant

10,403
4
22
26

score 0 · Answer 3 · edited May 23 '17 at 12:22

0

Try:

# for example (you are reading this in from Excel)
dat <- data.frame("goodcol" = c(1,2), "badcol1"= c(-1,-2),
                  "badcol2"= c(-2,-4), "goodcol2" = c(2,4))

sds.drop <- c("badcol1", "badcol2")

dat <-  dat[, !(names(dat) %in% sds.drop)]

You may have been missing a bracket and needed to refer to the dateframe in names().

EDIT For those who later search - Similar to Drop data frame columns by name

edited May 23 '17 at 12:22

Community

1
1

answered Apr 26 '15 at 17:15

micstr

5,080
8
48
76

Thanks @micstr, but still doesn't work. I fixed the code and added the brackets around names, but to no avail. – b222 Apr 26 '15 at 17:21
This is not a precedence issue; `%any%` special operators have higher precedence than negation (`!`). See https://stat.ethz.ch/R-manual/R-devel/library/base/html/Syntax.html. – bgoldst Apr 26 '15 at 17:29
Just check your sds.drop type is not numeric but a list? Will try check your data later. – micstr Apr 26 '15 at 18:39

Attempting to drop numerous columns in R

3 Answers3