2

I have several .csv files in a folder. I want to read them all once by using the command

library(data.table)
path <-path
list <-  list.files(path,pattern="*.csv")
files <- paste(path,list,sep='/')
DT <- do.call(rbind, lapply(files, fread))

However, since the first column is a 12 digits number, data.table shows it in a scientific number way, like

5.43971221673e-313

How should I convert all the scientific numbers into normal integers?

Thanks a lot!

First edit: After I use the command

options("scipen"=100, "digits"=12)

It still shows the number like

5.43971221673e-313

Even after I applied the command

options(scipen=999)

It gives me back the number

0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000543971221673

And even 543971221673 is not the correct number, the correct one should be

110101001001

I was using data.frame to try to convert, it works.

a <- read.csv(files)
a[,1] <- as.character(a[,1])

But I would prefer to use data.table to make it fast.

Thank you guys!

Yijiao Liu
  • 184
  • 12
  • 1
    Please use `rbindlist(lapply(files, fread))`. `do.call(rbind...` is very slow. – Matt Dowle Mar 18 '17 at 00:54
  • @MattDowle why would the value change when you read this number `fread('110101001001\n')` – Sathish Mar 18 '17 at 01:14
  • 1
    I got the answer here[Convenience features of fread](https://github.com/Rdatatable/data.table/wiki/Convenience-features-of-fread). by doing `options(datatable.integer64="character")`@Sathish – Yijiao Liu Mar 18 '17 at 01:21
  • @YijiaoLiu Please post a demonstration of what you tried and the output as separate answer – Sathish Mar 18 '17 at 01:22
  • @Sathish thanks a lot! – Yijiao Liu Mar 18 '17 at 01:24
  • You said you _would prefer to use data.table to make it fast_. If all files have the same structure then you should consider to use `DT <- rbindlist(lapply(files, fread))` instead of `DT <- do.call(rbind, lapply(files, fread))`. By default, `rbindlist()` combines columns by position which is about 2 times faster than by column names. For details, please, refer to the excellent explanations and benchmarks in the answers to [Why is rbindlist “better” than rbind?](http://stackoverflow.com/q/15673550/3817004). – Uwe Mar 18 '17 at 10:08
  • @UweBlockglad to know, I have corrected it:) – Yijiao Liu Mar 19 '17 at 19:59

2 Answers2

2
options(scipen = 999)
data.table(a = c(1e15, 2e15))
#                   a
# 1: 1000000000000000
# 2: 2000000000000000

options(scipen = 4)
data.table(a = c(1e15, 2e15))
#        a
# 1: 1e+15
# 2: 2e+15
Sathish
  • 12,453
  • 3
  • 41
  • 59
  • Hi, thank you, but instead, it shows 0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000543971221678;and 543971221678 is not the correct number, which should be 110101001002 – Yijiao Liu Mar 18 '17 at 00:59
  • 1
    that long string occurs , because you turned off scientific notation – Sathish Mar 18 '17 at 01:09
  • to change it back to default, use `scipen = 0 ` – Sathish Mar 18 '17 at 01:10
  • 1
    The value is correct based on the data in your question, which is 5.43971221673e-313 – Sathish Mar 18 '17 at 01:10
  • @YijiaoLiu yes, you are right. – Sathish Mar 18 '17 at 01:14
  • but if I imported the data with `read.csv` instead of data.table, after I apply `as.character()` to the column, the `110101001002` shows – Yijiao Liu Mar 18 '17 at 01:16
  • I tried your suggestion, it is not working for me. I tried this command `a1 <- fread('110101001001\n'); as.character(a1$V1)` – Sathish Mar 18 '17 at 01:18
2

This issue is solved(at least temporarily) by the reference hereConvenience features of fread.

'fread automatically detects large integers (> 2^31) and reads them as type integer64 from the bit64 package. '

Just need to

install.packages("bit64")

or put

options(datatable.integer64="character")

before the data.table, then it works. The example could be

library(data.table)
path <-path
list <-  list.files(path,pattern="*.csv")
files <- paste(path,list,sep='/')
options(datatable.integer64="character")
DT <- rbindlist(lapply(files, fread))

Thanks for @Sathish this is the first time I asked R question here!

Matt Dowle
  • 58,872
  • 22
  • 166
  • 224
Yijiao Liu
  • 184
  • 12
  • Which version of data.table are you using? I suspect it's less than v1.10.2 which contained news item: `When fread() or print() see integer64 columns are present, bit64's namespace is now automatically loaded for convenience.` – Matt Dowle Mar 19 '17 at 09:29
  • @MattDowlei am using 1.10.4 of data.table, R version is 3.3.2. – Yijiao Liu Mar 19 '17 at 19:58
  • Ok. And did the warning message appear asking you to install `bit64` packages, and did you? Setting `options(datatable.integer64="character")` is not recommended unless you really want numbers like that as character. Please remove that option, check you see the warning that `bit64` package is not installed and then install `bit64` and try again. – Matt Dowle Mar 20 '17 at 04:24
  • I've just discovered that the intended warning if you haven't got `bit64` installed is not happening due to a bug in that part of the code. I've just fixed it [here](https://github.com/Rdatatable/data.table/commit/0351b48e18c5c804e1dda81fcb65f78e9ab2383f) . Sorry about that. – Matt Dowle Mar 22 '17 at 21:34
  • @MattDowleHi Matt these days I was busy doing the data cleaning work and I tried a little with package bit64 but not fully, thank you so much since I am working with my thesis now with this dataset. – Yijiao Liu Mar 22 '17 at 23:03