1

I have a csv file that looks like this:

A, B,  C, 
1, 2 1, 3,
3, 1, 0, 
4, 1, 0 5,
 ...

is it possible to set the na.string to assign all fields with space to NA (e.g. something like regex function(x){x[grep(patt="\\ ", x)]<-NA;x}), i.e.

A, B, C,
1, NA, 3,
3, 1, 0,
4, 1, NA,
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
ceoec
  • 203
  • 2
  • 10
  • Could you have a field contain only spaces, or will you always have at least another character? – Aaron May 19 '16 at 09:45
  • there are no fields with only spaces -- always have at least other characters... – ceoec May 19 '16 at 09:48

3 Answers3

2

We can loop over the columns and set it to NA by converting to numeric

df1[] <- lapply(df1, as.numeric)

NOTE: Here, I assumed that the columns are character class. If it is factor, do lapply(df1, function(x) as.numeric(as.character(x)))

akrun
  • 874,273
  • 37
  • 540
  • 662
2

Variation on @akrun's answer (which I like).

library(dplyr)
read.csv("test.csv", colClasses="character") %>% mutate_each(funs(as.numeric))

This reads the file assuming all columns are character, then converts all to numeric with mutate_each from dplyr.

Using colClasses="numeric" already in the read call didn't work (and I don't know why :( ), since

> as.numeric("2 1")
[1] NA

From How to read data when some numbers contain commas as thousand separator? we learn that we can make a new function to do the conversion.

setAs("character", "numwithspace", function(from) as.numeric(from) )
read.csv("test.csv", colClasses="numwithspace")

which gives

  A  B  C
1 1 NA  3
2 3  1  0
3 4  1 NA
Community
  • 1
  • 1
bytesinflight
  • 1,624
  • 2
  • 17
  • 28
1

I don't know how this would translate in , but I would use the following regex to match fields containing spaces :

[^, ]+ [^, ]+

Which is :

  • some characters other than a comma or a space ([^, ]+)
  • followed by a space ()
  • and some more characters other than a comma or a space ([^, ]+)

You can see it in action here.

Aaron
  • 24,009
  • 2
  • 33
  • 57