1

I have a data frame with six columns saved as a csv file. Two of the columns are very sparse, and include a lot of blanks (which I'd like to be NAs). One sparse column, flops also has a very wide range of values (as low as 500 and as high as 93000000000000000).

I have tried various solutions from here and here with no luck. For some reason, only the 500 data point gets preserved.

For example:

> DATA$flops2 <- as.numeric(levels(DATA$flops))
Error in `$<-.data.frame`(`*tmp*`, flops2, value = c(NA, NA, NA, NA, NA,  : 
  replacement has 14 rows, data has 79
In addition: Warning message:
NAs introduced by coercion 
> is.numeric(flops2)
[1] TRUE
> flops2
 [1]  NA  NA  NA  NA  NA  NA  NA 500  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA
[21]  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA
[41]  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA
[61]  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA
> flops
 [1]                                                                     
 [4]                                                                     
 [7]                        500                                          
[10]                                                                     
[13]                                                                     
[16]                                                                     
[19]                                                                     
[22]                                                                     
[25]                                               3,000,000             
[28]                                               5,000,000             
[31]                                                                     
[34]                                                                     
[37]                        160,000,000                                  
[40]                                                                     
[43]                        800,000,000                                  
[46]                        1,900,000,000                                
[49]                                                                     
[52]                                                                     
[55]                                                                     
[58]                        2,000,000,000,000                            
[61]                                               7,000,000,000,000     
[64] 36,000,000,000,000                                                  
[67] 470,000,000,000,000                                                 
[70]                                                                     
[73]                        16,000,000,000,000,000 34,000,000,000,000,000
[76]                                               93,000,000,000,000,000
[79]                       
14 Levels:  1,900,000,000 16,000,000,000,000,000 160,000,000 ... 93,000,000,000,000,000

The same or similar happens for most of the conversion techniques.

wugology
  • 193
  • 1
  • 4
  • 13

2 Answers2

1

The issue is with assigning levels output to the original dataset column that have more length. We need to expand the output of as.numeric to the full length

DATA$flops2 <- as.numeric(levels(DATA$flops))[DATA$flops]

e.g.

set.seed(24)
v1 <- factor(sample(1:3, 10, replace = TRUE))
as.numeric(levels(v1))[v1]

Based on the input showed, there is the , for numeric entries. We may need to remove that and then convert it to numeric

DATA$flops2 <- as.numeric(gsub(",", "", DATA$flops))
akrun
  • 874,273
  • 37
  • 540
  • 662
0

The varhandle package works, but still is a character not a numeric.

> install.packages("varhandle")
> library(varhandle)
> DATA$flops2 <- unfactor(DATA$flops)
wugology
  • 193
  • 1
  • 4
  • 13