Drop column by name using [ without assigning name to data frame/ matrix object

Question

@Joris Meys great answer to this famous question suggests to drop columns by name using a list of names. It requires previous assignment of a name to the data frame/ matrix and using names(df), or for matrices colnames(matrix).

Out of curiosity, I wondered if a similar strategy is possible without assigning a name to the data frame/ matrix in a first place. I was pondering on this fact on answering this question (from where I nicked my sample data).

My suggested solution drops the column with select as follows:

bind_cols(split(df$b, df$year)) %>% select(-'1997')

I was first trying to use do.call(cbind, split(df$b, df$year)) instead, but this gave a matrix, and dplyr::select did not like that. Now I could of course positive select:

do.call(cbind, split(df$b, df$year))[,c('1996','1998')]

I could also use subset :

subset(do.call(cbind, split(df$b, df$year)), select = - `1997`)

My question is how to use [ for 'negative selection' by name (here: dropping of 1997), without previous assignment of the matrix/ data frame, i.e. in a one liner.

data

set.seed(77)
df <- data.frame(year = rep(1996:1998,3), a = runif(9), b = runif(9), e = runif(9))

# required result something like: (result from code above)   

          1996      1998
[1,] 0.4569087 0.9881951
[2,] 0.1658851 0.4475605
[3,] 0.3647157 0.7033574

It's not clear to me how your `df` would result in the desired output (without some extra transformation). Can you talk more about the process? — Roman Luštrik, Sep 09 '18 at 07:07
Not really an answer to the question but you could assign `NULL` to column `1997` using `"[<-"`, i.e. `bind_cols(split(df$b, df$year)) %>% \`[<-\`(., '1997', value = NULL)` — markus, Sep 09 '18 at 07:09
@RomanLuštrik with my above code using `subset(do.call(cbind, split(df$b, df$year)), select = - '1997')`. Note that I have changed `set.seed` from 88 to 77 a minute after I have originally posted the question - maybe you were very fast and have used the first data .... ? — tjebo, Sep 09 '18 at 07:10
@markus this would be a way for sure - however you are right, it is not exactly what I meant — tjebo, Sep 09 '18 at 07:14
@Sotos using `which` and the information of the original df would also be a good way, this is true. Thanks for pointing out to `cbind.data.frame`, very useful :) — tjebo, Sep 09 '18 at 07:17
How about subsetting out 1997 and reshaping the data afterwards into a long format? — Roman Luštrik, Sep 09 '18 at 07:23
@RomanLuštrik Thanks for the thoughts. This is basically what @akrun did in his answer to the original question. My question is more of a theoretical nature, if a 'negative subset' by name with `[` is possible without having to assign a name to the df/matrix first — tjebo, Sep 09 '18 at 07:25

score 1 · Accepted Answer · answered Sep 09 '18 at 07:32

1

There are obviously many ways to achieve that but if you just want to use the negative subset by name, then one way is to use your original dataframe to get the first position of your target and use that to remove it, i.e.

do.call(cbind, split(df$b, df$year))[,-which(df$year == '1997')[1]]

which gives,

          1996      1998
[1,] 0.4569087 0.9881951
[2,] 0.1658851 0.4475605
[3,] 0.3647157 0.7033574

NOTE 1: Your initial data frame must be sorted on year

NOTE 2: You can use cbind.data.frame to get your output as a data frame

answered Sep 09 '18 at 07:32

Sotos

51,121
6
32
66

I think this is a good idea - but somehow it leaves me unsatisfied that it would require the information of the previous data structure and moreover be dependent on correct sorting. It is one very good and interesting answer to the question. I will wait with accepting it as *the* answer though if you don’t mind. – tjebo Sep 09 '18 at 07:48
@Tjebo No problem. i m also curious to see other options – Sotos Sep 09 '18 at 07:59
Although not really what I intended, you have answered this question correctly. I have made a follow up question [here](https://stackoverflow.com/q/52401049/7941188) Would be happy to have your thoughts on it :) – tjebo Sep 19 '18 at 08:07

AndS. · Answer 2 · 2018-09-09T19:05:49.097

This doesn't select columns by name, but what if you filter the rows in split first using [ for negative selection.

do.call(cbind, split(df[-which(df$year == 1997),"b"], df[-which(df$year == 1997), "year"]))
#>           1996      1998
#> [1,] 0.4569087 0.9881951
#> [2,] 0.1658851 0.4475605
#> [3,] 0.3647157 0.7033574

or maybe a super long one-liner for negative column indexing

do.call(cbind, split(df$b, df$year))[,-which(colnames(do.call(cbind, split(df$b, df$year))) == "1997")]
#>           1996      1998
#> [1,] 0.4569087 0.9881951
#> [2,] 0.1658851 0.4475605
#> [3,] 0.3647157 0.7033574

Although, you could condense it with a pipe

do.call(cbind, split(df$b, df$year)) %>%  .[,-which(colnames(.) == "1997")]
#>           1996      1998
#> [1,] 0.4569087 0.9881951
#> [2,] 0.1658851 0.4475605
#> [3,] 0.3647157 0.7033574

thanks for the thoughts! These are all good ideas, and your code is working for this example. But it was actually meant more as a theoretical consideration if this works in general, rather than for this particular example (should have clarified this better, will try to improve the wording of the question), - ideally without the need of using the previously defined / assigned data frame as help. — tjebo, Sep 09 '18 at 19:58

juarpasi · Answer 3 · 2023-06-11T17:33:45.853

What about this?

In one line calling an anonymous function

(function(df) df[!names(df) %in% c('1997')])(as.data.frame(do.call(cbind, split(df$b, df$year))))
#       1996      1998
#1 0.2309219 0.9199970
#2 0.7308675 0.1856637
#3 0.6101509 0.6482355

The as.data.frame(do.call(cbind, split(df$b, df$year))) converts into the argument of the anonymous function. I think this option does not require to assign the names and don't depend in the information of the previously object, since the names(df) inside the anonymous function definition use the names of the argument.

But we can declare it as a function an use the pipe |> operator to make it more readable:

dropByNames <- function(df, toDrop) df[!names(df) %in% toDrop]

df |>
  with(split(b, year)) |>
  list2DF()|>
  dropByNames('1997')

Drop column by name using [ without assigning name to data frame/ matrix object

3 Answers3

Linked