read csv setting fields with spaces to NA

Question

I have a csv file that looks like this:

A, B,  C, 
1, 2 1, 3,
3, 1, 0, 
4, 1, 0 5,
 ...

is it possible to set the na.string to assign all fields with space to NA (e.g. something like regex function(x){x[grep(patt="\\ ", x)]<-NA;x}), i.e.

A, B, C,
1, NA, 3,
3, 1, 0,
4, 1, NA,

Could you have a field contain only spaces, or will you always have at least another character? — Aaron, May 19 '16 at 09:45
there are no fields with only spaces -- always have at least other characters... — ceoec, May 19 '16 at 09:48

akrun · Accepted Answer · 2016-05-19T10:06:21.303

2

We can loop over the columns and set it to NA by converting to numeric

df1[] <- lapply(df1, as.numeric)

NOTE: Here, I assumed that the columns are character class. If it is factor, do lapply(df1, function(x) as.numeric(as.character(x)))

edited May 19 '16 at 10:06

answered May 19 '16 at 09:50

akrun

874,273
37
540
662

it turns all the numeric fields to string, i.e. `"1", NA, "3"` instead of `1, NA, 3` – ceoec May 19 '16 at 10:01
1

as.numeric! should have thought of this simple method! huge thanks! – ceoec May 19 '16 at 11:55

score 2 · Answer 2 · edited May 23 '17 at 12:31

Variation on @akrun's answer (which I like).

library(dplyr)
read.csv("test.csv", colClasses="character") %>% mutate_each(funs(as.numeric))

This reads the file assuming all columns are character, then converts all to numeric with mutate_each from dplyr.

Using colClasses="numeric" already in the read call didn't work (and I don't know why :( ), since

> as.numeric("2 1")
[1] NA

From How to read data when some numbers contain commas as thousand separator? we learn that we can make a new function to do the conversion.

setAs("character", "numwithspace", function(from) as.numeric(from) )
read.csv("test.csv", colClasses="numwithspace")

which gives

Aaron · Answer 3 · 2016-05-19T09:57:09.433

1

I don't know how this would translate in r, but I would use the following regex to match fields containing spaces :

[^, ]+ [^, ]+

Which is :

some characters other than a comma or a space ([^, ]+)
followed by a space ()
and some more characters other than a comma or a space ([^, ]+)

You can see it in action here.

edited May 19 '16 at 09:57

answered May 19 '16 at 09:50

Aaron

24,009
2
33
57

read csv setting fields with spaces to NA

3 Answers3