122

I am trying to specify the colClasses options in the read.csv function in R. In my data, the first column time is basically a character vector, while the rest of the columns are numeric.

data <- read.csv("test.csv", comment.char="" , 
                 colClasses=c(time="character", "numeric"), 
                 strip.white=FALSE)

In the above command, I want R to read in the time column as "character" and the rest as numeric. Although the data variable did have the correct result after the command completed, R returned the following warnings. I am wondering how I can fix these warnings?

Warning messages:
1: In read.table(file = file, header = header, sep = sep, quote = quote, : not all columns named in 'colClasses' exist
2: In tmp[i[i > 0L]] <- colClasses : number of items to replace is not a multiple of replacement length

Derek

Peter
  • 11,500
  • 5
  • 21
  • 31
defoo
  • 5,159
  • 11
  • 34
  • 39

7 Answers7

201

You can specify the colClasse for only one columns.

So in your example you should use:

data <- read.csv('test.csv', colClasses=c("time"="character"))
cwallenpoole
  • 79,954
  • 26
  • 128
  • 166
Etienne
  • 2,011
  • 2
  • 12
  • 2
88

The colClasses vector must have length equal to the number of imported columns. Supposing the rest of your dataset columns are 5:

colClasses=c("character",rep("numeric",5))
gd047
  • 29,749
  • 18
  • 107
  • 146
  • 7
    one can probably use the following to read the first line of the csv and determine how many columns there are. scan(csv,sep=',', what="character" , nlines=1 ) – defoo May 10 '10 at 19:53
  • 37
    This actually is an incorrect answer and threw me off for a little while. The correct answer is below. Not trying to be a jerk, just wanted to make sure it doesn't happen to anyone else. – Rob Nov 08 '12 at 14:33
  • 4
    @Rob In my case, this is still the correct answer, when you also need to specify the classes of the other variables, and they are not automatically recognized as such by `read.table`. – tchakravarty Dec 13 '14 at 18:00
14

Assuming your 'time' column has at least one observation with a non-numeric character and all your other columns only have numbers, then 'read.csv's default will be to read in 'time' as a 'factor' and all the rest of the columns as 'numeric'. Therefore setting 'stringsAsFactors=F' will have the same result as setting the 'colClasses' manually i.e.,

data <- read.csv('test.csv', stringsAsFactors=F)
wkmor1
  • 7,226
  • 3
  • 31
  • 23
12

If you want to refer to names from the header rather than column numbers, you can use something like this:

fname <- "test.csv"
headset <- read.csv(fname, header = TRUE, nrows = 10)
classes <- sapply(headset, class)
classes[names(classes) %in% c("time")] <- "character"
dataset <- read.csv(fname, header = TRUE, colClasses = classes)
rslite
  • 81,705
  • 4
  • 44
  • 47
scentoni
  • 729
  • 7
  • 5
10

I know OP asked about the utils::read.csv function, but let me provide an answer for these that come here searching how to do it using readr::read_csv from the tidyverse.

read_csv ("test.csv", col_names=FALSE, col_types = cols (.default = "c", time = "i"))

This should set the default type for all columns as character, while time would be parsed as integer.

elcortegano
  • 2,444
  • 11
  • 40
  • 58
5

For multiple datetime columns with no header, and a lot of columns, say my datetime fields are in columns 36 and 38, and I want them read in as character fields:

data<-read.csv("test.csv", head=FALSE,   colClasses=c("V36"="character","V38"="character"))                        
0

If we combine what @Hendy and @Oddysseus Ithaca contributed, we get cleaner and a more general (i.e., adaptable?) chunk of code.

    data <- read.csv("test.csv", head = F, colClasses = c(V36 = "character", V38 = "character"))                        
seapen
  • 345
  • 1
  • 4
  • 13