0

I have a data.frame called tmp. Here is the summary:

> summary(tmp)
 Organization       Advance Monthly Sales     Other       Homeownership Rate
 Length:2460        Min.   :  0           Min.   :    0   Min.   :   0      
 Class :character   1st Qu.:  0           1st Qu.:    0   1st Qu.:   0      
 Mode  :character   Median :  0           Median :    2   Median :   0      
                    Mean   :  1           Mean   :   53   Mean   :   3      
                    3rd Qu.:  0           3rd Qu.:   14   3rd Qu.:   0      
                    Max.   :637           Max.   :34622   Max.   :3272      
 New Residential Construction New Residential Sales Construction Spending
 Min.   :   0                 Min.   :   0          Min.   :    0        
 1st Qu.:   0                 1st Qu.:   0          1st Qu.:    0        
 Median :   0                 Median :   0          Median :    0        
 Mean   :  10                 Mean   :   1          Mean   :   83        
 3rd Qu.:   0                 3rd Qu.:   0          3rd Qu.:    0        
 Max.   :9078                 Max.   :1856          Max.   :60630        
 U.S. International Manufacturing and Trade Advance Report on Durable Goods
 Min.   :    0      Min.   :  0             Min.   :   0                   
 1st Qu.:    0      1st Qu.:  0             1st Qu.:   0                   
 Median :    0      Median :  0             Median :   0                   
 Mean   :   18      Mean   :  0             Mean   :   2                   
 3rd Qu.:    3      3rd Qu.:  0             3rd Qu.:   0                   
 Max.   :11992      Max.   :874             Max.   :4785                   
 Quarterly Financial Report Advance U.S. Intl Trades Monthly Wholesale Trade
 Min.   :  0                Min.   :  0              Min.   :  0            
 1st Qu.:  0                1st Qu.:  0              1st Qu.:  0            
 Median :  0                Median :  0              Median :  0            
 Mean   :  0                Mean   :  0              Mean   :  0            
 3rd Qu.:  0                3rd Qu.:  0              3rd Qu.:  0            
 Max.   :478                Max.   :849              Max.   :697            
 Quarterly Services Survey Business Formation Statistics     Total  
 Min.   :  0               Min.   :  0                   Min.   :0  
 1st Qu.:  0               1st Qu.:  0                   1st Qu.:0  
 Median :  0               Median :  0                   Median :0  
 Mean   :  0               Mean   :  0                   Mean   :0  
 3rd Qu.:  0               3rd Qu.:  0                   3rd Qu.:0  
 Max.   :423               Max.   :233                   Max.   :0

I'm using this command to create a column "N"

tmp$Total <- rowSums(tmp[, -1])

And then I see this output:

> head(tmp, 1)
                          Organization Advance Monthly Sales Other
1 VeriSign Infrastructure & Operations                     1     0
  Homeownership Rate New Residential Construction New Residential Sales
1                  0                            0                     0
  Construction Spending U.S. International Manufacturing and Trade
1                     0                  0                       0
  Advance Report on Durable Goods Quarterly Financial Report
1                               0                          0
  Advance U.S. Intl Trades Monthly Wholesale Trade Quarterly Services Survey
1                        0                       0                         0
  Business Formation Statistics         Total
1                             0 4.940656e-324

I know that doesn't look nice, but you can see the sum of the row should be something like 1 but instead I'm ending up with this very small fraction. Am I doing something wrong here?

*** EDIT ***

> dput(head(tmp, 1))
structure(list(Organization = "VeriSign Infrastructure & Operations", 
    `Advance Monthly Sales` = structure(4.94065645841247e-324, class = "integer64"), 
    `New Residential Sales` = structure(0, class = "integer64"), 
    `U.S. International` = structure(0, class = "integer64"), 
    Other = structure(0, class = "integer64"), `New Residential Construction` = structure(0, class = "integer64"), 
    `Advance Report on Durable Goods` = structure(0, class = "integer64"), 
    `Homeownership Rate` = structure(0, class = "integer64"), 
    `Construction Spending` = structure(0, class = "integer64"), 
    `Manufacturing and Trade` = structure(0, class = "integer64"), 
    `Quarterly Financial Report` = structure(0, class = "integer64"), 
    `Advance U.S. Intl Trades` = structure(0, class = "integer64"), 
    `Monthly Wholesale Trade` = structure(0, class = "integer64"), 
    `Quarterly Services Survey` = structure(0, class = "integer64"), 
    `Business Formation Statistics` = structure(0, class = "integer64")), row.names = c(NA, 
-1L), class = c("tbl_df", "tbl", "data.frame"))

*** EDIT 2 ***

some more stuff:

> tmp$"Advance Monthly Sales"
integer64
[1] 1   0   0   1   0   0   0   0   0   2   0   0   9   0   0   0   0   0  
[19] 0   0   0   0   1   0   0   0   0   1   0   0   0   8   0   0   0   1   
[37] 0   0   1   0   0   1   0   0   0   0   0   0   0   0   1   0   0   0  
[55] 1   0   0   0   0   0   1   0   0   0   4   0   0   0   0   0   0   0  
[73] 0   1   13  0   0   0   0   0   0   0   0   0   2   0   0   0   0   0  
[91] 0   0   14  0   0   0   1   0   9   0   0   0   0   0   1   0   0   0 

> tmp$"Advance Monthly Sales" %>% class()
[1] "integer64"
> tmp2 <- tmp
> tmp2$"Advance Monthly Sales" <- as.numeric(tmp2$"Advance Monthly Sales")
> tmp2$"Advance Monthly Sales" %>% class()
[1] "numeric"
> dput(head(tmp2, 1))
structure(list(Organization = "VeriSign Infrastructure & Operations", 
    `Advance Monthly Sales` = 1, `New Residential Sales` = structure(0, class = "integer64"), 
    `U.S. International` = structure(0, class = "integer64"), 
    Other = structure(0, class = "integer64"), `New Residential Construction` = structure(0, class = "integer64"), 
    `Advance Report on Durable Goods` = structure(0, class = "integer64"), 
    `Homeownership Rate` = structure(0, class = "integer64"), 
    `Construction Spending` = structure(0, class = "integer64"), 
    `Manufacturing and Trade` = structure(0, class = "integer64"), 
    `Quarterly Financial Report` = structure(0, class = "integer64"), 
    `Advance U.S. Intl Trades` = structure(0, class = "integer64"), 
    `Monthly Wholesale Trade` = structure(0, class = "integer64"), 
    `Quarterly Services Survey` = structure(0, class = "integer64"), 
    `Business Formation Statistics` = structure(0, class = "integer64")), row.names = c(NA, 
-1L), class = c("tbl_df", "tbl", "data.frame"))

Then I try tmp2$Total <- rowSums(tmp2[, -1]) again and I still get the following:

> head(tmp2$Total, 20)
 [1]  1.000000e+00 1.976263e-322 4.940656e-324  1.000000e+00 4.940656e-324
 [6] 6.958915e-320 3.952525e-323 4.940656e-324 3.458460e-323  2.000000e+00
[11] 1.037538e-322 1.235164e-322  9.000000e+00 4.940656e-324 4.940656e-324
[16] 4.940656e-324 1.828043e-322 4.940656e-324 4.940656e-324 4.001932e-322
bcstryker
  • 456
  • 3
  • 15
  • 1
    does `base::rowSums(tmp[, -1])` give you the same value? you can also add `dput(head(tmp, 1))` to your question – rawr Feb 16 '21 at 20:36
  • 1
    @FrsLry, it's legit, see [`?Extract`](https://stat.ethz.ch/R-manual/R-devel/library/base/html/Extract.html) (which includes `?[` and `?[[`) and look for *"negative"*. – r2evans Feb 16 '21 at 20:47
  • 1
    Okay, I'm intrigued now. That looks to be the literal minimum value that R can represent - see `?.Machine`: "*On a typical R platform the smallest positive double is about ‘5e-324’*" as per here too - https://stackoverflow.com/questions/38165221/r-largest-smallest-representable-numbers – thelatemail Feb 16 '21 at 21:33
  • bcstryker, can you add the output from `dput(head(tmp,1))` please? – r2evans Feb 16 '21 at 21:49
  • What is the output of `str(tmp)`? If everything on the question is correct, then I can only see variable types as a possible problem. – VFreguglia Feb 16 '21 at 22:00
  • `unclass(as.integer64(1))` gives 4.940656e-324. This is where I would look. – pseudospin Feb 16 '21 at 22:25
  • Hey Guys! Thanks for the responses. I added the output of dput(head(tmp, 1)) to the question. I see the issue is with the Advance Monthly Sales because that is the only thing that seems out of place but I tried what @pseudospin recommended below which fixed that column but then there were other issues. See below. – bcstryker Feb 17 '21 at 23:35

2 Answers2

1

This is my guess at what is happening

df <- data.frame(x = bit64::as.integer64(1), y = 0)
print(df)
#>   x y
#> 1 1 0
rowSums(df)
#> [1] 4.940656e-324

Whichever method you are importing your data with might be making Advance Monthly Sales a 64 bit integer.

The simplest remedy is to make that column a double with as.numeric().


Looks like every column is integer64. The problem is rowSums strips the class from the sum. An easy solution is just to put it back

library(bit64)
df <- data.frame(x = as.integer64(1), y = as.integer64(2))
df$z <- rowSums(df)
print(df)
#>   x y             z
#> 1 1 2 1.482197e-323
class(df$z) <- 'integer64'
df
#>   x y z
#> 1 1 2 3

Or you can convert every column to doubles with as.numeric.

pseudospin
  • 2,737
  • 1
  • 4
  • 19
  • Hey @pseudospin I tried what you recommended and had the same final output. See the question second edit. tmp$"Advance Monthly Sales" was definitely integer64 but now that it's numeric I'm still getting these strange outputs from rowSums(). – bcstryker Feb 17 '21 at 23:43
1

I figured out what I had to do:

> is.integer64 <- function(x){
  class(x)=="integer64"
}
> sel <- sapply(tmp, is.integer64)
> tmp[sel] <- lapply(tmp[sel], as.numeric)
> dput(head(tmp, 1))
structure(list(Organization = "VeriSign Infrastructure & Operations", 
    `Advance Monthly Sales` = 1, `New Residential Sales` = 0, 
    `U.S. International` = 0, Other = 0, `New Residential Construction` = 0, 
    `Advance Report on Durable Goods` = 0, `Homeownership Rate` = 0, 
    `Construction Spending` = 0, `Manufacturing and Trade` = 0, 
    `Quarterly Financial Report` = 0, `Advance U.S. Intl Trades` = 0, 
    `Monthly Wholesale Trade` = 0, `Quarterly Services Survey` = 0, 
    `Business Formation Statistics` = 0, Total = 1), row.names = c(NA, 
-1L), class = c("tbl_df", "tbl", "data.frame"))
> tmp$Total <- rowSums(tmp[, -1])
> head(tmp2$Total,20)
 [1]     2    40     1     8     1 14085     8     1     7    41    21    25
[13]   129     1     1     1    37     1     1    81

Thanks again everyone!

bcstryker
  • 456
  • 3
  • 15