0

I have a messy input file with lots of completely blank columns, and I'm trying to remove them like this (all the empty columns are un-named so R assigns V1, V2, etc.):

df1[,-grep("V\\d+", colnames(df1))]

However, the above line just returns the vector of all the empty columns (-1, -2, -3, -7, -10...), and doesn't actually remove each column the way df1[, -c(1, 2, 3, 7, 10) would.

Do I need to pass the vector differently?

Sample data (sanitized), was stored as data.table converted for dput():

structure(list(V1 = c(NA, NA, NA, NA, NA), V2 = c(NA, NA, NA, 
NA, NA), `Employee Name` = c("", "Bob", "", "Bob", "Bob"), V4 = c(NA, 
NA, NA, NA, NA), V5 = c(NA, NA, NA, NA, NA), `Question 1` = c("", 
"--", "", "Yes", ""), V7 = c(NA, NA, NA, NA, NA), V8 = c(NA, 
NA, NA, NA, NA), `Question 2` = c("", "No", "", "Yes", ""), V10 = c(NA, 
NA, NA, NA, NA), V11 = c(NA, NA, NA, NA, NA), `Question 3` = c("", 
"--", "", "Yes", ""), V13 = c(NA, NA, NA, NA, NA), V14 = c(NA, 
NA, NA, NA, NA), `Question 4` = c("", "--", "", "Yes", ""), V16 = c(NA, 
NA, NA, NA, NA), V17 = c(NA, NA, NA, NA, NA), V18 = c(NA, NA, 
NA, NA, NA), V19 = c(NA, NA, NA, NA, NA), V20 = c(NA, NA, NA, 
NA, NA), `Question 5` = c("", "--", "", "Yes", ""), V22 = c(NA, 
NA, NA, NA, NA), V23 = c(NA, NA, NA, NA, NA), V24 = c(NA, NA, 
NA, NA, NA), V25 = c(NA, NA, NA, NA, NA), `Question 6` = c("", 
"--", "", "Yes", ""), V27 = c(NA, NA, NA, NA, NA), V28 = c(NA, 
NA, NA, NA, NA), V29 = c(NA, NA, NA, NA, NA), V30 = c(NA, NA, 
NA, NA, NA), V31 = c(NA, NA, NA, NA, NA)), .Names = c("V1", "V2", 
"Employee Name", "V4", "V5", "Question 1", "V7", "V8", "Question 2", 
"V10", "V11", "Question 3", "V13", "V14", "Question 4", "V16", 
"V17", "V18", "V19", "V20", "Question 5", "V22", "V23", "V24", 
"V25", "Question 6", "V27", "V28", "V29", "V30", "V31"), row.names = c(NA, 
5L), class = "data.frame")
Mako212
  • 6,787
  • 1
  • 18
  • 37
  • `df1[, -c(grep("V\\d+", colnames(df1)))]` or `df1[, -c(grep("V\\d+", colnames(df1), value = TRUE))]`? – Jaap Jul 10 '17 at 16:10
  • @Jaap that throws an `invalid argument to unary operator` error – Mako212 Jul 10 '17 at 16:12
  • @Jaap and to clarify post your edit, `value = TRUE` throws the above error, adding the `-c` wrapper just returns the vector of column indices – Mako212 Jul 10 '17 at 16:14
  • Your code should work, maybe adding a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610) would help clarify your problem. – Jaap Jul 10 '17 at 16:18
  • @Jaap just added a sample data set – Mako212 Jul 10 '17 at 16:19
  • 3
    add `with = FALSE`: `df1[,-grep("V\\d+", colnames(df1)), with = FALSE]` – Jaap Jul 10 '17 at 16:20
  • @Jaap bingo, thank you – Mako212 Jul 10 '17 at 16:22
  • You can make `class = c("data.table", "data.frame")` to have the `dput` give a data table. – Rich Scriven Jul 10 '17 at 16:47
  • To be clear, the example works fine with the base R data.frame sample data. If you're using data.table, you need to **a.** show it in your example, **b.** tag it, and **c.** to solve your problem use its syntax, which differs for `[`, as Jaap alluded to above. – alistaire Jul 10 '17 at 17:49

1 Answers1

1

The comments point out how to do the regex using grep to select the columns, but since you're using data.table you can also delete the V## columns by reference.

dat[, grep("V\\d+", colnames(dat)) := NULL]

   Employee Name Question 1 Question 2 Question 3 Question 4 Question 5 Question 6
1:                                                                                
2:           Bob         --         No         --         --         --         --
3:                                                                                
4:           Bob        Yes        Yes        Yes        Yes        Yes        Yes
5:           Bob                                                                  
Eric Watt
  • 3,180
  • 9
  • 21