0

I have a large spreaded data frame:

df: a1 a2 a3 a4 a5 ...............
    r  w  sd w  y ........

I have another input which is a subset of df.

subset_df: a3 a4 a5
           f  e  u 

My goal is to take the column names of subset_df, select these columns in df and continue from there (in my case to compare the values).

When I do this the simple way:

df[,names(subset_df)] it works, but why it refuses to work with dplyr select?

Here is the error when running:

names_sub_df <- names(subset_df)
df %>% select(names_sub_df)


Error: All select() inputs must resolve to integer column positions.
The following do not:
*  as.vector(names_sub_df)

Here is a reproducible example:

key <- c("a1", "a2", "a3", "a4", "a5", "a6", "a7", "a8", "a9", "a10", "a11", "a12", "a13", "a14", "a15", "a16", "a17", "a18")

value <- c("G", "CTT", "C", "C", "G", "C", "T", "C", "C", "C", "G", "T", "C", "G", "T", "A", "T", "G")


test2 <- data.frame(key, value, stringsAsFactors = FALSE)

library(tidyr)

Dr. Richard Tennen
  • 267
  • 1
  • 2
  • 9

2 Answers2

2

In the absence of a minimal reproducible example using mtcars as an example.

You can wrap your subset dataframe in colnames so select uses the names, not the whole dataframe, for the subsetting:

mtcars
subset_mtcars = c("hp", "drat", "wt")
subset_mtcars = mtcars[, subset_mtcars]
subset_mtcars

library("tidyverse")    
mtcars %>% 
  select(colnames(subset_mtcars))

#                      hp drat    wt
# Mazda RX4           110 3.90 2.620
# Mazda RX4 Wag       110 3.90 2.875
# Datsun 710           93 3.85 2.320
# ...
Phil
  • 4,344
  • 2
  • 23
  • 33
  • thanks a lot for the answer, I am sorry for not providing the example, I thought my explanation will be enough, apologizing. Please be so kind and tell me why the error occured? – Dr. Richard Tennen Nov 26 '17 at 11:43
  • 1
    Because you are passing a list of quoted strings and select wants unquoted names. There are a number of answers to this question if you search. – Elin Nov 26 '17 at 11:55
  • @elin I don't think that's quite right; I think the OP was trying to pass a data frame to `select`. Admittedly that's still not the correct structure – Phil Nov 26 '17 at 12:03
  • @Elin it worked perfectly before on my pc, when I have transferred the same code to my work pc, it failed. Will check again. – Dr. Richard Tennen Nov 26 '17 at 12:04
  • 1
    I would also suggest always using the namespace notation when you are moving things to different computers since you may or may not have dplyr loaded. @Phil `names_sub_df <- names(subset_df)` is a character vector. Here's another duplicate with a `select()` example https://stackoverflow.com/questions/33284790/using-dplyrs-select-where-variable-names-are-quoted. – Elin Nov 26 '17 at 12:12
0

From your example I am not sure whether you are looking for selecting columns or the values in a column. If you are searching for the latter, the following will do the job:

subset_df <- c("a3", "a4", "a5")
test2[test2$key %in% subset_df, ]
Christoph
  • 6,841
  • 4
  • 37
  • 89