18

I have a question on how to avoid NA when using as.numeric function in R. As you can see below I have a character variable (but its values are numeric) of cumulative_viewers, and I wanted to convert it to numeric through as.numeric but it did not work properly. The problem is when the number of digits of the numeric values are beyond four digits the as.numeric function returns NA even though the values are numeric. For example, as.numeric function work well with the value of '999' or '997' BUT when the number of digits are more than four such as '1000' or '1001' or '999999' then the as.numeric function returns NA =.=;;;;;;;;;;;; not its real numeric value....

Could anyone please help me to solve the problem? I sent a day to handle it but could not have an answer yet TT>TT

paste(data_without_duplicates$cumulative_viewers)

    [1] "12,983,336" "12,323,294" "11,375,954" "10,917,221" "10,667,700"
    [6] "10,292,386" "9,350,192"  "9,135,520"  "9,001,309"  "8,653,415" 
    [11] "7,784,755"  "7,508,976"  "7,362,790"  "6,959,047"  "6,706,543" 
    .....
    [1426] "1,026"      "1,024"      "1,023"      "1,020"      "1,017"     
    [1431] "1,016"      "1,013"      "1,011"      "1,001"      "1,000"     
    [1436] "1,000"      "999"        "997"        "994"        "990"       
    [1441] "989"        "988"        "984"        "982"        "979"       
    [1446] "974"        "972"        "971"        "966"        "961"       


as.numeric(data_without_duplicates$cumulative_viewers)

    [1]  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA
    [18]  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA
    [35]  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA
    .......
    [1395]  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA
    [1412]  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA
    [1429]  NA  NA  NA  NA  NA  NA  NA  NA 999 997 994 990 989 988 984 982 979
    [1446] 974 972 971 966 961 959 958 957 950 946 941 930 929 911 911 910 910
    [1463] 910 907 907 902 898 897 895 892 890 890 889 885 885 883 872 871 868
pdobb
  • 17,688
  • 5
  • 59
  • 74
Jacob Green
  • 187
  • 1
  • 1
  • 5

2 Answers2

26

It's not really an issue with the number of digits, just the fact that your numbers with four or more digits have commas in them:

N1 <- c("1000", "1,000", "10000", "10,000")
as.numeric(N1)
##
[1]  1000    NA 10000    NA
Warning message:
NAs introduced by coercion
##
> N2 <- gsub(",","",N1)
> as.numeric(N2)
[1]  1000  1000 10000 10000
nrussell
  • 18,382
  • 4
  • 47
  • 60
7

It looks to me as if the commas in your data are the issue. There are probably dozens of way of dealing with this.

here's one

x <- c("12,983,336", "12,323,294", "11,375,954", "10,917,221", "10,667,700", 
       "10,292,386", "9,350,192", "9,135,520", "9,001,309", "8,653,415", 
       "7,784,755", "7,508,976", "7,362,790", "6,959,047", "6,706,543", 
       "1,026", "1,024", "1,023", "1,020", "1,017", "1,016", "1,013", 
       "1,011", "1,001", "1,000", "1,000", "999", "997", "994", "990", 
       "989", "988", "984", "982", "979", "974", "972", "971", "966", 
       "961")

as.numeric(gsub(",","",x,fixed=TRUE))
jalapic
  • 13,792
  • 8
  • 57
  • 87