automatically convert NAs to 0 while reading CSV

Question

In an attempt to save space I omitted zeros from my CSV files as a sort of sparse-ish representation (all data is numeric):

table = read.csv(text = "
V1,V2,V3
0.3,1.2,1.5
0.5,,2.1
,.1,")

Here's what I get:

> table

   V1  V2  V3
1 0.3 1.2 1.5
2 0.5  NA 2.1
3  NA 0.1  NA

I can go ahead and change the NAs to 0:

table[is.na(table)] = 0

    V1  V2  V3
1: 0.3 1.2 1.5
2: 0.5 0.0 2.1
3: 0.0 0.1 0.0

Just wondering if there's a one liner to do this while reading in, preferably with data.table's fread?:

table = fread("
V1,V2,V3
0.3,1.2,1.5
0.5,,2.1
,.1,")

More info: the reason I'd like to avoid

table[is.na(table)] = 0

is that while fread on my data is really fast, this operation is quite slow! (Not sure exactly why.) My dataset is 336 rows x 3939 columns. (G. Grothendieck's custom class answer is fast, thanks for that idea!)

Just found http://stackoverflow.com/questions/7235657/fastest-way-to-replace-nas-in-a-large-data-table — stackoverflax, Jan 03 '14 at 19:23

G. Grothendieck · Answer 1 · 2014-07-02T10:49:50.043

7

Set up a custom class that regards empty fields as 0. Given that setup its just a one line read.csv statement to read the data in:

# test data
Lines <- "V1,V2,V3
0.3,1.2,1.5
0.5,,2.1
,.1,
"

# set up custom class
setClass("empty.is.0")
setAs("character", "empty.is.0", 
      function(from) replace(as.numeric(from), from == "", 0))

# one liner
read.csv(text = Lines, strip.white = TRUE, colClasses = "empty.is.0")

edited Jul 02 '14 at 10:49

answered Jan 03 '14 at 17:48

G. Grothendieck

254,981
17
203
341

score 3 · Answer 2 · answered Jan 03 '14 at 17:15

Just make a wrapping function to read it in and then convert the NA's if this is something you do often.

my_read = function(..., replace=0) {
  data = fread(...)
  data[is.na(data)] = replace
  data
}

or if you want to be more general and work for any function

my_gen_read = function(..., FUN="fread", replace=0) {
  FUN = match.fun(FUN)
  data = FUN(...)
  data[is.na(data)] = replace
  data
}

score 2 · Answer 3 · answered Jan 03 '14 at 18:07

I suggest using standard compression tools instead of creating your own:

dt = data.table(a = 1:10) # your data.table

zf = gzfile('filename.gz', 'w') # or bzfile or xzfile
write.csv(dt, zf, quote = F, row.names = F)
close(zf)

# then read either with read.csv or fread (version 1.8.11+)
df = read.csv('filename.gz')
dt = fread('zcat filename.gz')

automatically convert NAs to 0 while reading CSV

3 Answers3