0

I am experimenting with different packages to find the best suit to save data files such as csv ones fast.

I have found 'iotools' package and the method 'write.csv.raw' that is pretty good to save data concerning the time lapsed.

However the dataset in the file saved has some controversial features:

  • no column names;
  • double/float numbers are with decimal sign "." but not with "," .

So I need to have dataset in the file saved to be with column names and the correct decimal sign.
My script as follows:

library(iotools)
library(UsingR)

data(galton)
head(galton)
#option1 to save data
write.csv.raw(galton,"test.csv",append=FALSE,sep=";",col.names=TRUE)
#option2 to save data
write.table.raw(galton,"test.csv",append=FALSE,sep=";",col.names=TRUE)
read.csv2("test.csv",nrow=5)

the input dataset (from R):

child parent
61.7   70.5
61.7   68.5
61.7   65.5
61.7   64.5
61.7   64.0
62.2   67.5

the output file:

X1.61.7 X70.5
2\t61.7  68.5
3\t61.7  65.5
4\t61.7  64.5
5\t61.7    64
6\t62.2  67.5  

Update of 18/02/16:
with help of the answer by procrastinator0 I have managed to use 'write.csv.raw' in correct manner.

The comparison of different write-methods based upon the dataframe from the question section as follows:

system.time(write.csv.raw(n,"test.csv",sep=";",append=TRUE))
user system elapsed
15.61 1.17 21.92

system.time(write.table(n,"test.csv",sep=";",row.names=FALSE,dec=","))
user system elapsed
63.25 1.20 64.60

system.time(write.csv2(n,"test.csv",row.names=FALSE))
user system elapsed 63.71 1.28 65.38

system.time(write_csv(n, "test.csv", na = "NA")) user system elapsed
136.75 3.60 141.24

Update of 27/04/16: (out of date)
I have done some experiment runs to write/read data (different tools). Experiments are based on the theoretical sample as well as the real one (from my practice). I have tried to make reproducible scripts. Hope they will be useful for newcomers :-)

Links to IO experiments:

Reading data from files: https://rpubs.com/demydd/166375
Writing data to files: https://rpubs.com/demydd/170957

Update of 19/09/16:
feather package is added (read_feather, write_feather) fwrite is added from data.table package.

links to updated tests:

to read
to write

Dimon D.
  • 438
  • 5
  • 23
  • Question is unclear, saying no columns, yet writing out with `col.names=TRUE`. Data is not controversial, what is the question? – zx8754 Feb 09 '16 at 15:09
  • It would also be interesting to know (roughly) the dimension of your real data that you're trying to write. – talat Feb 09 '16 at 15:20
  • @ zx8754: I mean the dataset in the file saved. If I open the file I see no column names and the decimal sign "." in place of ",". @ docendo discimus: the inital dataset was 386000 rows and 140 cols (numeric and non numeric). After application 'write.csv.raw' I have no colnames and the correct decimal sign. After that I started to test minute samples such as galton. – Dimon D. Feb 09 '16 at 15:27
  • Is `write.csv.raw` faster than `write_csv{readr}` ? You question is about which is the fastest method to write a `.csv` file, right? – rafa.pereira Feb 17 '16 at 17:05
  • @Rafael Pereira You are true. I am looking for the fastest way to write csv in correct manner. So far the fastest methods are fread() and write.csw.raw() according to my tests. But I have not tested write_csv{readr} yet. If you can offered something better - you are very welcome :-) – Dimon D. Feb 17 '16 at 17:13
  • You might also want to take a look at `fwrite {data.table}`. [It's really fast](http://stackoverflow.com/questions/12013953/write-csv-for-large-data-table-in-r) – rafa.pereira Jun 01 '16 at 07:37
  • @Rafael Pereira thanks for hints :-) I have used it already. but it is necessary to admit that fwrite is from the package version which is still under construction. At least, at the time of this comment. Another option to save data fast (binary format) is 'write_feather' {feather} – Dimon D. Jun 01 '16 at 12:47

2 Answers2

1

For column names, this is a known issue. Suggested workaround:

> cat(noquote(paste0(paste0(names(df),collapse = ","),"\n")),file = "output.csv")
> write.csv.raw(df,"output.csv",append=TRUE)

write.csv.raw does not index with "\t" for me by default, but you could try using NA for the nsep argument.

0

You can save the column names as factor and then use it as follows :

library(iotools)
library(UsingR)

data(galton)

Cnames=as.factor(colnames(galton))

write.table(galton,"test2.csv",sep=";")

test2=read.delim("test2.csv",sep = ";",)
colnames(test2)=Cnames

The output is :

head(test2)
  child parent
1  61.7   70.5
2  61.7   68.5
3  61.7   65.5
4  61.7   64.5
5  61.7   64.0
6  62.2   67.5
knifer
  • 9
  • 5
  • @ knifer: thanks for help. But it seems to me that you are not using the tool from iotools package ('write.csv.raw' or 'write.table.raw'. These methods will reduce the time cost greatly (in my case of real data set (386000 X 140) by 3x times). Just to use 'write.table' will have no gain in time overheads. – Dimon D. Feb 09 '16 at 17:13