2

I am having a problem using the colClasses function in read.xlsx

I have the following data.frame

mydata <- read.xlsx("dataset_1.xlsx", sheetName = "dataset_1")
head(mydata)
Treatment Nitrate_conc
1         1           12
2         1           12
3         1           15
4         1           16
5         1           12
6         2           18
str(mydata)
data.frame':    20 obs. of  2 variables:
$ Treatment   : num  1 1 1 1 1 2 2 2 2 2 ...
$ Nitrate_conc: num  12 12 15 16 12 18 25 26 28 28 ...

I want to import Treatment as a factor. to do this I have attempted to use the colClasses function as an argument as shown below:

mydata1 <- read.xlsx("dataset_1.xlsx", sheetName = "dataset_1", colClasses = c("Treatment" = "factor", "Nitrate_conc" = "numeric"))

However I get the following error:

Error in class(aux) <- colClasses[ic] : adding class factor to an invalid object

Can anyone point out what I am doing wrong?

MichaelChirico
  • 33,841
  • 14
  • 113
  • 198
Rory Shaw
  • 811
  • 2
  • 9
  • 26
  • I think it is more `colClasses = c("factor", "numeric")`. –  Sep 29 '15 at 14:13
  • @Pascal the [docs](http://www.inside-r.org/packages/cran/xlsx/docs/read.xlsx) suggest `colClasses` can take a named argument. – MichaelChirico Sep 29 '15 at 14:27
  • @RoryShaw have you checked `names(mydata)` to be sure there isn't perhaps an errant space? You may also consider using one of the other, [faster](http://stackoverflow.com/questions/6099243/read-an-excel-file-directly-from-a-r-script/31734198#31734198) Excel reading options... I personally only use `read.xlsx` when the file has a `Date` in it. – MichaelChirico Sep 29 '15 at 14:28
  • @MichaelChirico thanks for your help and link to other methods. – Rory Shaw Sep 29 '15 at 14:55
  • @Pascal pretty certain colClassess can take a named argument - it works with read.csv etc – Rory Shaw Sep 29 '15 at 14:55
  • @RoryShaw Pretty sure I never used it. But please do so. –  Sep 29 '15 at 22:17
  • Were you able to find a solution? – fahmy Dec 27 '15 at 04:40

1 Answers1

2

This is an old question, but it looks like it was never fully answered.

This has nothing to do with whether or not the elements of the list for colClasses is named. The problem can be traced through the documentation ?read.xlsx . In describing the colClasses parameter, the documentation points to the documentation for readColumns. In the description there, it says

Only numeric, character, Date, POSIXct, column types are accepted. Anything else will be coverted to a character type.

So specifying 'factor' is not permitted. Also note that under ... it says

other arguments to data.frame, for example stringsAsFactors

So, we can use

mydata <- read.xlsx("dataset_1.xlsx", sheetName = "dataset_1", 
  colClasses=c("character", "numeric"))
str(mydata)
'data.frame':   6 obs. of  2 variables:
 $ Treatment   : Factor w/ 2 levels "1","2": 1 1 1 1 1 2
 $ Nitrate_conc: num  12 12 15 16 12 18

You can also use:

mydata <- read.xlsx("dataset_1.xlsx", sheetName = "dataset_1", 
    colClasses=c(Treatment = "character", Nitrate_conc = "numeric"))

It looks like there is just one parameter stringsAsFactors so it may not be possible to read both factors and strings at the same time. Of course, you can always convert a column to a factor after having read it as a different type.

G5W
  • 36,531
  • 10
  • 47
  • 80