I want to select some variables from my csv file in R. I used this select(gender*, age*)
, but got the error - object not found. I tried select(`gender*`, `age*`)
and select(starts_with(gender), starts_with(age))
, but neither works. Does anyone know how to select variables with star symbols? Thanks a lot!
Asked
Active
Viewed 638 times
2

Xiaotong
- 43
- 4
-
Not able to reproduce `names(iris)[2] <- 'gender*';iris%>% head %>% select(`gender*`)` – akrun Apr 08 '20 at 21:27
-
1are you trying to use * as a wildcard? to select variables that look like, for example, `gender1’, `gender2`, etc? – paqmo Apr 08 '20 at 21:40
-
1Are you passing a data.frame or tibble into select as well? What exactly does your code and data look like. It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. – MrFlick Apr 08 '20 at 21:44
2 Answers
1
It is possible that the select
from dplyr
is masked by select
from any other package as this is working fine. Either specify the packagename with ::
or do this on a fresh R
session with only dplyr
loaded
library(dplyr)
data(iris)
iris$'gender*' <- 'M'
iris%>%
head %>%
dplyr::select(`gender*`)
# gender*
#1 M
#2 M
#3 M
#4 M
#5 M
#6 M

akrun
- 874,273
- 37
- 540
- 662
-
Thanks akrun! I'm very new to R, this is my first time knowing iris. Where should I put my dataframe? Could you please explain more? thank you. – Xiaotong Apr 08 '20 at 21:49
-
@Xiaotong The main point is to put names with special characters between back ticks. As for `iris`, it's one of the built-in data sets that comes with R. It is many times used to give self-contained examples because it has a mix of numeric and categorical variables. You don't have to put it anywhere, it's just there, to be a test data set when you need one. – Rui Barradas Apr 08 '20 at 22:00
-
1Thanks both, problem solved! I have just found that read.csv has automatically turned all my special characters (including spaces and stars) into periods. So I was using the wrong names! – Xiaotong Apr 08 '20 at 22:34
-
@Xiaotong iin the `read.csv`, there is an argument `check.names` which is by default `TRUE`, you can change it to `FALSE` and then it would be the exact column name wihtout the `.` – akrun Apr 08 '20 at 22:37
1
To select a list of column names starting with a specific string, one can use the starts_with()
function in dplyr
. To illustrate, we'll select the two columns that start with the string Sepal
, as in Sepal.Length
and Sepal.Width
.
library(dplyr)
select(iris,starts_with("Sepal")) %>% head()
...and the output:
> select(iris,starts_with("Sepal")) %>% head()
Sepal.Length Sepal.Width
1 5.1 3.5
2 4.9 3.0
3 4.7 3.2
4 4.6 3.1
5 5.0 3.6
6 5.4 3.9
>
We can do the same thing in Base R with grepl()
and a regular expression.
# base R version
head(iris[,grepl("^Sepal",names(iris))])
...and the output:
> head(iris[,grepl("^Sepal",names(iris))])
Sepal.Length Sepal.Width
1 5.1 3.5
2 4.9 3.0
3 4.7 3.2
4 4.6 3.1
5 5.0 3.6
6 5.4 3.9
>
Also note that if one is using read.csv()
to create a data frame in R, it converts any occurrences of *
in column headings to .
.
# confirm that * is converted to . in read.csv()
textFile <- 'v*1,v*2
1,2
3,4
5,6'
data <- read.csv(text = textFile,header = TRUE)
# see how illegal column name * is converted to .
names(data)
...and the output:
> names(data)
[1] "v.1" "v.2"
>

Len Greski
- 10,505
- 2
- 22
- 33