-1

I have received a csv table of compound concentrations. Scattered throughout there are character values with various meanings, such as > 888, <0.2, /, and many more.

Is there a way, preferably using base R or readr, to convert these to NA while reading in and thus start from numeric data only?

At the moment I can only find a solution that relies on hard-coding every character string which would be too difficult and time-consuming.

Community
  • 1
  • 1
Joe
  • 8,073
  • 1
  • 52
  • 58
  • Have you tried to specify those as `na.strings` when reading the csv file? – talat Apr 27 '17 at 12:50
  • This is the solution in the linked answer, but I was wondering how this could be applied to any character string rather than specified ones. – Joe Apr 27 '17 at 12:54
  • The accepted answer in the linked post is different. The question then is whether you have any way of knowing what values there can be. If not, I guess you'll have to read them completely and convert later using as.numeric – talat Apr 27 '17 at 12:56
  • No, they're messy and having non-numeric characters is all they have in common. Probably I'll just have to read them and convert as you say. – Joe Apr 27 '17 at 12:57

1 Answers1

2

Once you have read them in, just use as.numeric...

a <- c("1","2","3",">4","5","6-7","8+","9")

as.numeric(a)
1  2  3 NA  5 NA NA  9
Andrew Gustar
  • 17,295
  • 1
  • 22
  • 32
  • Thanks, but out of curiosity I wondered if it could be done during the process of reading in. (No doubt there are many ways to achieve this once the data are already in R.) – Joe Apr 27 '17 at 12:55
  • I don't know of an easy way - you would have to read each item in anyway, so that the code can decide whether to accept or reject it, so in terms of efficiency it makes sense to read everything in and then do the tidying. – Andrew Gustar Apr 27 '17 at 13:00