1

A strange problem with write and read.csv. I have ways to work around this but would be great if someone can identify what is going on.

I have code from someone else which dynamically creates a series of CSVs by appending new rows. The problem is that read.csv appears to read the newly created csv inconsistently.

Dummy code example:

datfile <- "E:/temp.csv"
write(paste("Name","tempname",sep=","),datfile,1)
write(paste("VShort",50,sep=","),datfile,1,append=T)
write(paste("Short1",1,1,sep=","),datfile,1,append=T)
write(paste("Short2",0,2,sep=","),datfile,1,append=T)
write(paste("Short3",0,2,sep=","),datfile,1,append=T)
write(paste("Long",0,0.3,0.6,1,sep=","),datfile,1,append=T)
write(paste("Short4",2,0,sep=","),datfile,1,append=T)
read.csv(datfile,header=F,colClasses="character")

Seven rows of data written to CSV, but read.csv reads in 8 rows (Long is split over two rows). Eight rows and three columns read in.

The problem is fixed by opening temp.csv in Excel and saving. Then read.csv reads in the 7 lines appropriately.

The problem only appears to exist under certain conditions. For example, remove Short 3 and there is no problem:

datfile2 <- "E:/temp2.csv"
write(paste("Name","tempname",sep=","),datfile2,1)
write(paste("VShort",50,sep=","),datfile2,1,append=T)
write(paste("Short1",1,1,sep=","),datfile2,1,append=T)
write(paste("Short2",0,2,sep=","),datfile2,1,append=T)
write(paste("Long",0,0.3,0.6,1,sep=","),datfile2,1,append=T)
write(paste("Short4",2,0,sep=","),datfile2,1,append=T)
read.csv(datfile2,header=F,colClasses="character") 

Six rows and five columns are read in.

Any ideas what is going on here?

R version 3.2.4 Revised

Windows 10

Adrian
  • 684
  • 3
  • 20

1 Answers1

2

This is probably related to the following in ?read.csv:

The number of data columns is determined by looking at the first five lines of input (or the whole file if it has less than five lines), or from the length of col.names if it is specified and is longer. This could conceivably be wrong if fill or blank.lines.skip are true, so specify col.names if necessary (as in the ‘Examples’).

It just happens that the row with the most number of columns is the sixth row in your first example.

I suggest using col.names to get around this, e.g.:

`... read.csv(..., col.names = paste0('V', 1:6))`

As the OP notes in a comment to this answer, you can find out the number of columns required using readLines:

Ncol <- max(unlist(lapply(strsplit(readLines(datfile), ","), length)))

and then modify the above to give:

read.csv(datfile,header=F,colClasses="character", col.names=paste0("V", 1:Ncol))
Alex
  • 15,186
  • 15
  • 73
  • 127
  • Ahh thank you. Shows again that a careful read of the help material would have answered the question. I'll give col.names a go, but the number of columns is not fixed so may not be straightforward. – Adrian May 12 '16 at 08:37
  • 1
    I ended up using: `Ncol <- max(unlist(lapply(strsplit(readLines(datfile), ","), length)))` and `read.csv(datfile,header=F,colClasses="character", col.names=paste0("V", 1:Ncol))`. I found [this solution](http://stackoverflow.com/questions/18922493/how-can-you-read-a-csv-file-in-r-with-different-number-of-columns) helpful as well – Adrian May 13 '16 at 02:09