2

I want to select some variables from my csv file in R. I used this select(gender*, age*), but got the error - object not found. I tried select(`gender*`, `age*`) and select(starts_with(gender), starts_with(age)), but neither works. Does anyone know how to select variables with star symbols? Thanks a lot!

Xiaotong
  • 43
  • 4
  • Not able to reproduce `names(iris)[2] <- 'gender*';iris%>% head %>% select(`gender*`)` – akrun Apr 08 '20 at 21:27
  • 1
    are you trying to use * as a wildcard? to select variables that look like, for example, `gender1’, `gender2`, etc? – paqmo Apr 08 '20 at 21:40
  • 1
    Are you passing a data.frame or tibble into select as well? What exactly does your code and data look like. It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. – MrFlick Apr 08 '20 at 21:44

2 Answers2

1

It is possible that the select from dplyr is masked by select from any other package as this is working fine. Either specify the packagename with :: or do this on a fresh R session with only dplyr loaded

library(dplyr)
data(iris)
iris$'gender*' <- 'M'
iris%>% 
      head %>% 
      dplyr::select(`gender*`)
#   gender*
#1       M
#2       M
#3       M
#4       M
#5       M
#6       M
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Thanks akrun! I'm very new to R, this is my first time knowing iris. Where should I put my dataframe? Could you please explain more? thank you. – Xiaotong Apr 08 '20 at 21:49
  • @Xiaotong The main point is to put names with special characters between back ticks. As for `iris`, it's one of the built-in data sets that comes with R. It is many times used to give self-contained examples because it has a mix of numeric and categorical variables. You don't have to put it anywhere, it's just there, to be a test data set when you need one. – Rui Barradas Apr 08 '20 at 22:00
  • 1
    Thanks both, problem solved! I have just found that read.csv has automatically turned all my special characters (including spaces and stars) into periods. So I was using the wrong names! – Xiaotong Apr 08 '20 at 22:34
  • @Xiaotong iin the `read.csv`, there is an argument `check.names` which is by default `TRUE`, you can change it to `FALSE` and then it would be the exact column name wihtout the `.` – akrun Apr 08 '20 at 22:37
1

To select a list of column names starting with a specific string, one can use the starts_with() function in dplyr. To illustrate, we'll select the two columns that start with the string Sepal, as in Sepal.Length and Sepal.Width.

library(dplyr)
select(iris,starts_with("Sepal")) %>% head()

...and the output:

> select(iris,starts_with("Sepal")) %>% head()
  Sepal.Length Sepal.Width
1          5.1         3.5
2          4.9         3.0
3          4.7         3.2
4          4.6         3.1
5          5.0         3.6
6          5.4         3.9
>

We can do the same thing in Base R with grepl() and a regular expression.

# base R version
head(iris[,grepl("^Sepal",names(iris))])

...and the output:

> head(iris[,grepl("^Sepal",names(iris))])
  Sepal.Length Sepal.Width
1          5.1         3.5
2          4.9         3.0
3          4.7         3.2
4          4.6         3.1
5          5.0         3.6
6          5.4         3.9
>

Also note that if one is using read.csv() to create a data frame in R, it converts any occurrences of * in column headings to ..

# confirm that * is converted to . in read.csv()
textFile <- 'v*1,v*2
1,2
3,4
5,6'
data <- read.csv(text = textFile,header = TRUE)
# see how illegal column name * is converted to . 
names(data)

...and the output:

> names(data)
[1] "v.1" "v.2"
> 
Len Greski
  • 10,505
  • 2
  • 22
  • 33