I have data.frames with character columns containing numbers (like '0123', '1234' etc). When I write them to csv and read them back, they end up as numeric columns. The write.csv
and read.csv
functions have quote
arguments, and by default should quote character strings on output and respect them on input, so this behavior is unexpected.
How can I avoid this, without manually specifying colClasses
when I read the file back in?
Reproducible example:
# dummy data
fake_data <-
data.frame(num=1:25, char=letters[1:25], charnum=as.character(1:25),
stringsAsFactors=F)
# check out col classes - all good
sapply(fake_data, class)
# num char charnum
# "integer" "character" "character"
# write it to a file and read it back
fpath <- '~/Desktop/fake_data.csv'
write.csv(fake_data, fpath, row.names=F)
fake_data2 <- read.csv(fpath, stringsAsFactors=F)
# but now look, different classes!
sapply(fake_data2, class)
# num char charnum
# "integer" "character" "integer"
It seems like the error is on the read side, since the file is being written with quotes.
> cat(readLines(fpath))
"num","char","charnum" 1,"a","1" 2,"b","2" 3,"c","3" 4,"d","4" 5,"e","5" 6,"f","6" 7,"g","7" 8,"h","8" 9,"i","9" 10,"j","10" 11,"k","11" 12,"l","12" 13,"m","13" 14,"n","14" 15,"o","15" 16,"p","16" 17,"q","17" 18,"r","18" 19,"s","19" 20,"t","20" 21,"u","21" 22,"v","22" 23,"w","23" 24,"x","24" 25,"y","25"
sessionInfo:
R version 3.1.1 (2014-07-10) | Platform: x86_64-apple-darwin13.1.0 (64-bit)