1

I am using the read.csv function along with the colClasses parameter for reading my csv file. What i want to achieve is that for all the colClasses that are "factors" , I want to specify the order of the factors .i.e.

If the column "Liquid-type" has the following factors :- "Water' , "Juice" , "Soda" , "Alcohol" , i want to control the ordering of factors .lets say

Water = 3
Juice=1
Soda=2
Alcohol =0 

So how can i control the order of factors in read.csv?

Edit: Your comment below, formatted:

setClass("customFactor") 
setAs("character", "customFactor", function(from) {SpecifyOrders(from)}) 
SpecifyOrders <- function(from) { 
       from<- factor(from, levels=c(new_order)) }  
IRTFM
  • 258,963
  • 21
  • 364
  • 487
blank
  • 109
  • 1
  • 3
  • 12
  • You don't. You re-order the factors once they've been read in using `factor` and specifying the levels in the order you want. – joran Sep 18 '13 at 19:20

2 Answers2

2

One question would be "why?". And other related questions: do you just want to relevel the factor?, or do you really want an ordered factor?, or do you want to recode to to numeric values?

To relevel with that order you might do this after data input:

Liquid.type <- factor(Liquid.type, levels=c("Alcohol","Juice","Soda","Water"))

(Although that would have already been the order since the default ordering is alpha-sorted.) If you want to get the values 0-3 from that factor:

Liquid.type <- as.numeric(Liquid.type) -1

There as methods to do this at the time of read.table or read.csv but they are somewhat more baroque and involved and we would need to see a use-case to justify the effort.

IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • I want to do this because eventually i will be building a logistic regression model from the read.csv dataframe and i want to specify the order of factors before i do any logistic regression modelling. Does this justify as a use case for doing it during read.csv ? – blank Sep 18 '13 at 20:26
  • Not as far as I can see. You would need to define a separate class for each column and using `var <- factor(var, levels=new_order)` seems a lot easier than doing all of that. – IRTFM Sep 18 '13 at 20:27
  • I will be working with large datsets and want the script to be memory efficient. Changing the order after the read.csv function will result in increased memory usage which i really want to avoid . So I am hoping that I can make it work in read.csv . – blank Sep 18 '13 at 20:42
  • I think i got it to work by creating my own class....setClass("customFactor") setAs("character","customFactor", function(from){ SpecifyOrders(from) } ) SpecifyOrders <- function(from) { from<- factor(from, levels=c(new_order)) } – blank Sep 18 '13 at 20:47
  • You can find a worked example here: http://stackoverflow.com/questions/5068705/processing-negative-number-in-accounting-format/5069649#5069649 – IRTFM Sep 18 '13 at 20:50
  • Another example: http://stackoverflow.com/questions/8081451/read-table-and-apply-functions-to-a-column/8081810#8081810 – IRTFM Sep 18 '13 at 21:07
1

So I think i figured out the answer ...

setClass("customFactor") 
setAs("character","customFactor", function(from){ SpecifyOrders(from) } ) 
SpecifyOrders <- function(from) { from<- factor(from, levels=c(new_order)) }

Dataframe <- read.csv(data=data, colClasses=c("character","customFactor","numeric"))
blank
  • 109
  • 1
  • 3
  • 12
  • FYI - I would be very surprised if this really any more efficient in terms of memory usage. Whether you call `factor` to reset the levels within `read.table` or afterwards should have the exact same memory implications. If it were me I would benchmark this just to be sure, and include doing things like simply calling `levels<-` which shouldn't have much overhead. – joran Sep 18 '13 at 21:33
  • You are right and I guess I might do the refactoring after reading the csv instead of creating my own class – blank Sep 19 '13 at 00:03