0

I have small data set with one column in char format. Below you can see data.

 test<-structure(list(txtVALUE = c("<5", "<5", "8", "<5", "9", "12", 
                                         "45", "5", "<5", "<5", "11,478", "117", "1,526", "1,642", "3,920", 
                                         "98", "8", "<5", "<5", "<5", "<5")), row.names = c(NA, -21L), class = c("tbl_df", 
                                                                                                                 "tbl", "data.frame"))
  

Now I want to convert this data from chr format in numeric.I tried with this command below

      test$txtVALUE<-as.numeric(test$txtVALUE)
Warning message:
NAs introduced by coercion 

But this command does not convert data as I expected. Namely, numbers such as "1,526", "1,642", and "3,920" are converted in NAN values, although they are numbers.

So can anybody help me how to convert this data from char to numeric in the proper way without NaN for numbers?

silent_hunter
  • 2,224
  • 1
  • 12
  • 30
  • 4
    What number do you want `"<5"` to be represented as? – SamR Feb 02 '23 at 20:34
  • 1
    If you want to ignore punctuation, you could use `readr::parse_number(test$txtVALUE)`. But that does turn `<5` into `5` which may or may not be desireable. – MrFlick Feb 02 '23 at 20:35
  • @SamR for "<5" can be Nan is not a problem, but the problem is numerical values such as "1,526", "1,642", and "3,920" – silent_hunter Feb 02 '23 at 20:38
  • 1
    Pretty much a duplicate of https://stackoverflow.com/questions/1523126/how-to-read-data-when-some-numbers-contain-commas-as-thousand-separator – thelatemail Feb 02 '23 at 20:48

1 Answers1

2

Your data appears to be counts so I have taken a slight liberty of assuming that it's always whole numbers. If it is not do not use this approach as it will delete decimal points as well.

However, if it is, as you want "<5" to be NA, you can use gsub() to replace all values that contain "<" with a blank string, and also delete anything which is not a number (e.g. commas in "11,478").

Of course gsub() produces a character vector so wrap this in as.integer().

as.integer(gsub("\\D|<.+", "", test$txtVALUE))
#  [1]    NA    NA     8    NA     9    12    45     5    NA    NA 11478   117  1526  1642  3920    
# [16]    98     8    NA    NA    NA    NA
SamR
  • 8,826
  • 3
  • 11
  • 33