Specifying colClasses in the read.csv

Question

I am trying to specify the colClasses options in the read.csv function in R. In my data, the first column time is basically a character vector, while the rest of the columns are numeric.

data <- read.csv("test.csv", comment.char="" , 
                 colClasses=c(time="character", "numeric"), 
                 strip.white=FALSE)

In the above command, I want R to read in the time column as "character" and the rest as numeric. Although the data variable did have the correct result after the command completed, R returned the following warnings. I am wondering how I can fix these warnings?

Warning messages:
1: In read.table(file = file, header = header, sep = sep, quote = quote, : not all columns named in 'colClasses' exist
2: In tmp[i[i > 0L]] <- colClasses : number of items to replace is not a multiple of replacement length

Derek

score 201 · Answer 1 · edited Nov 18 '11 at 16:55

201

You can specify the colClasse for only one columns.

So in your example you should use:

data <- read.csv('test.csv', colClasses=c("time"="character"))

edited Nov 18 '11 at 16:55

cwallenpoole

79,954
26
128
166

answered Nov 18 '11 at 16:38

Etienne

2,011
2
12
2

24

Not that it matters much, but I found this to work without quoting the column name. – Hendy Apr 08 '14 at 00:59
This approach is actually very useful when trying to read quoted integers as character. Thanks! – nils-holmberg Feb 21 '20 at 11:34

score 88 · Accepted Answer · answered May 10 '10 at 18:36

88

The colClasses vector must have length equal to the number of imported columns. Supposing the rest of your dataset columns are 5:

colClasses=c("character",rep("numeric",5))

answered May 10 '10 at 18:36

gd047

29,749
18
107
146

7

one can probably use the following to read the first line of the csv and determine how many columns there are. scan(csv,sep=',', what="character" , nlines=1 ) – defoo May 10 '10 at 19:53
37

This actually is an incorrect answer and threw me off for a little while. The correct answer is below. Not trying to be a jerk, just wanted to make sure it doesn't happen to anyone else. – Rob Nov 08 '12 at 14:33
4

@Rob In my case, this is still the correct answer, when you also need to specify the classes of the other variables, and they are not automatically recognized as such by `read.table`. – tchakravarty Dec 13 '14 at 18:00

score 14 · Answer 3 · answered May 10 '10 at 23:19

Assuming your 'time' column has at least one observation with a non-numeric character and all your other columns only have numbers, then 'read.csv's default will be to read in 'time' as a 'factor' and all the rest of the columns as 'numeric'. Therefore setting 'stringsAsFactors=F' will have the same result as setting the 'colClasses' manually i.e.,

data <- read.csv('test.csv', stringsAsFactors=F)

score 12 · Answer 4 · edited Mar 16 '13 at 20:49

12

If you want to refer to names from the header rather than column numbers, you can use something like this:

fname <- "test.csv"
headset <- read.csv(fname, header = TRUE, nrows = 10)
classes <- sapply(headset, class)
classes[names(classes) %in% c("time")] <- "character"
dataset <- read.csv(fname, header = TRUE, colClasses = classes)

edited Mar 16 '13 at 20:49

rslite

81,705
4
44
47

answered Dec 19 '11 at 19:53

scentoni

729
7
5

score 10 · Answer 5 · answered Sep 14 '18 at 16:41

I know OP asked about the utils::read.csv function, but let me provide an answer for these that come here searching how to do it using readr::read_csv from the tidyverse.

read_csv ("test.csv", col_names=FALSE, col_types = cols (.default = "c", time = "i"))

This should set the default type for all columns as character, while time would be parsed as integer.

score 5 · Answer 6 · answered May 10 '17 at 21:50

For multiple datetime columns with no header, and a lot of columns, say my datetime fields are in columns 36 and 38, and I want them read in as character fields:

data<-read.csv("test.csv", head=FALSE,   colClasses=c("V36"="character","V38"="character"))

score 0 · Answer 7 · answered Nov 02 '18 at 17:35

If we combine what @Hendy and @Oddysseus Ithaca contributed, we get cleaner and a more general (i.e., adaptable?) chunk of code.

    data <- read.csv("test.csv", head = F, colClasses = c(V36 = "character", V38 = "character"))

Specifying colClasses in the read.csv

7 Answers7

Linked

Related