Why we use packages to read csv files in R instead of using functions like read.csv?
Asked
Active
Viewed 47 times
0
-
4speed https://stackoverflow.com/questions/1727772/quickly-reading-very-large-tables-as-dataframes-in-r – user20650 Aug 19 '17 at 18:29
-
2big data and CSV don't mix. CSV is an immensely wasteful storage for numerical data, and if used for floating-point values, also has *severe* numerical problems. It needs parsing, you need to read the first 999,999 lines to know the first value of the 1 millionth row, it's not a well-defined format, there's plenty of pitfalls, it's not robust. In short, it's a shitty format for things that aren't small tables of mostly text. Rule of thumb: Avoid CSV for things with more than 1000 lines – no one needs that in plain text, anyway. – Marcus Müller Aug 19 '17 at 18:33
-
I completely agree with @MarcusMüller here, but there is no denying that the format has a large fan base, and if handed such a file ... `data.table::fread()` is excellent and very fast. But csv is still an inferior storage mode, though a little less bad than some of the others. Binary will always win, but harder to do portably. – Dirk Eddelbuettel Aug 19 '17 at 18:34
-
@DirkEddelbuettel thanks! I nearly completely agree: binary formats are hard to implement portably. But you basically never do that, you just use an established format with a well-tested portable library (think HDF5, for example). Also: it's not *harder* to implement portably. I had a real-world application break down. Why? Because someone was writing CSV with some Windows library. That library had localization. They were using a German windows. German decimal separator is comma. Comma separated values isn't even language-portable. – Marcus Müller Aug 19 '17 at 18:37
-
1And that "someone" was actually the driver of a data acquisition device. – Marcus Müller Aug 19 '17 at 18:38
-
I too have an Umlaut in my name and know even if I haven't lived over there in 25+ years :) Now, UTF-8 helps a little and localization could bite you but yes -- binary, when you can, is nice. Potentially faster _and_ more robust _and_ more precise. – Dirk Eddelbuettel Aug 19 '17 at 18:40