2

I am reading in some files created by another program. The program populates entries with missing values with the number -99.9.

I am speeding up some code from using base read.table() to fread() from the data.table package to read in these data files. I am able to use na.strings=c(-99.9) in read.table, but fread does not seem to accept numeric arguments for na.strings. The string counterpart, na.strings=c("-99.9"), does not work, and gives me the error: 'na.strings' is type '.

Can I make fread read in the number -99.9 as NA?

ialm
  • 8,510
  • 4
  • 36
  • 48
  • if you're comfortable with sed/awk you could do this on the commandline and substitute -99.9 with NA and then read the file in with fread – infominer Mar 11 '14 at 17:07
  • I checked if fread can take a colClasses arguement, turns out it from the documentation "All controls such as sep, colClasses and nrows are automatically detected. bit64::integer64 types are also detected and read directly without needing to read as character before converting." So your best bet is to do a sed substitute and then load the file(s) in R. This also might help http://www.biostat.jhsph.edu/~rpeng/docs/R-large-tables.html – infominer Mar 11 '14 at 17:17
  • @infominer I am aware of the `colClasses` argument, but I am afraid that I may not have prior knowledge about the column classes in the data files in the scripts. There are different types of files, and too many to hard code column classes for all of them. Will consider using sed/awk, or doing a post-processing and replacing -99.9 after reading in the files. – ialm Mar 11 '14 at 18:19
  • 2
    It's on the list to make `na.strings` work as you expected: [#2660](https://r-forge.r-project.org/tracker/index.php?func=detail&aid=2660&group_id=240&atid=975). I've added a link there back to here. Hopefully soon. – Matt Dowle Mar 12 '14 at 01:25

1 Answers1

-1

If you read it in as dt change those values to NA afterwards.

dt[dt == -99.9] <- NA

problem solved?

JeremyS
  • 3,497
  • 1
  • 17
  • 19
  • 2
    Yes, but that won't be efficient. Use `set()` or `:=` like this : http://stackoverflow.com/a/7249454/403310 – Matt Dowle Mar 12 '14 at 01:20
  • Yes, this is what I ended up doing yesterday. I used a for loop with `set` over the columns to replace `-99.9` with `NA`s. – ialm Mar 12 '14 at 18:17