2

I have a .csv file like this (except that the real .csv file has many more columns):

library(tidyverse)

tibble(id1 = c("a", "b"),
       id2 = c("c", "d"),
       data1 = c(1, 2),
       data2 = c(3, 4),
       data1s = c(5, 6), 
       data2s = c(7, 8)) %>% 
  write_csv("df.csv")

I only want id1, id2, data1, and data2.

I can do this:

df <- read_csv("df.csv", 
               col_names = TRUE,
               cols_only(id1 = col_character(),
                         id2 =  col_character(),
                         data1 = col_integer(),
                         data2 = col_integer()))

But, as mentioned above, my real dataset has many more columns, so I'd like to use tidyselect helpers to only read in specified columns and ensure specified formats.

I tried this:

df2 <- read_csv("df.csv",
         col_names = TRUE,
         cols_only(starts_with("id") = col_character(),
                   starts_with("data") & !ends_with("s") =  col_integer()))

But the error message indicates that there's a problem with the syntax. Is it possible to use tidyselect helpers in this way?

nicholas
  • 903
  • 2
  • 12
  • The current read_csv function doesn‘t have a cols_only argument if I‘m not mistaken. You can ise cols_select, though, which should allow for tidyselect helpers. – deschen Sep 07 '22 at 22:57
  • It does have a cols_only function (my second code chunk above works, and I'm using the most recent version of readr). I'd be happy to use cols_select, but -- in that case -- how do I specify the column types with tidyselect? – nicholas Sep 07 '22 at 23:00
  • @deschen `cols_only()` is a function used in the `col_types` argument. – Darren Tsai Mar 14 '23 at 03:59

1 Answers1

3

My proposal is around the houses somewhat but it pretty much does let you customise the read spec on a 'rules' rather than explicit basis

library(tidyverse)

tibble(id1 = c("a", "b"),
       id2 = c("c", "d"),
       data1 = c(1, 2),
       data2 = c(3, 4),
       data1s = c(5, 6), 
       data2s = c(7, 8)) %>% 
  write_csv("df.csv")

# read only 1 row to make a spec from with minimal read; really just to get the colnames
df_spec <- spec(read_csv("df.csv", 
               col_names = TRUE,
        n_max = 1))

#alter the spec with base R functions startsWith / endsWith etc.
df_spec$cols <- imap(df_spec$cols,~{if(startsWith(.y,"id")){
  col_character()
} else if(startsWith(.y,"data") &
                                       !endsWith(.y,"s")){
  col_integer()
} else {
  col_skip()
}})

df <- read_csv("df.csv",
               col_types = df_spec$cols)
Nir Graham
  • 2,567
  • 2
  • 6
  • 10