Render NA count in dataframe

Question

I want to create a function to return the type of n-value (which is n-value is the 6 column of a dataframe) by using the following rules:

# n-value types
missing : NA
n > 0.05 : 'n.s.'
0.05 >= n > 0.01 : '*'
0.01 >= n > 0.001 : '**'
0.001 >= n > 0.0001 : '***'
0.0001 >= n : '****'

The first row of the data looks like:

         n.name    bMean    log2FoldChange    lfcSE        stat            pn         padj
        <fct>      <dbl>      <dbl>           <dbl>         <dbl>         <dbl>       <dbl>
469    TNFRSF1B  542.82545  -3.406411        0.2267235    -15.024517    5.07e-51    3.25e-48

I tried the following:

c.1 <- function(x){
  breaks <- c(0, 0.0001, 0.001, 0.01, 0.05, 1)
  stars <- c("****", "***", "**", "*", "n.s.")
  bins <- cut(x, breaks = breaks, labels = stars, include.lowest = TRUE)
  bins <- as.character(bins)
  list(p = x, stars = bins)
}
tab.1<-table(c.1(nav$pvalue))
apply(tab.1, 2, sum)

I almost got what I want:

*: 24 **:102 ***: 15 ****": 45 n.s.: 32

I have some NA instead of numerics but I did not get them in the output, so I tried:

a1<-as.numeric("NA")
c.1 <- function(x){
  breaks <- c(0, 0.0001, 0.001, 0.01, 0.05, 1, a1)
  stars <- c("****", "***", "**", "*", "n.s.", "NA")
  bins <- cut(x, breaks = breaks, labels = stars, include.lowest = FALSE)
  bins <- as.character(bins)
  list(p = x, stars = bins)
}
tab.1<-table(c.1(nav$pvalue))
apply(tab.1, 2, sum)

I get an error, how can I get NA count to be included in the output?

Waldi · Accepted Answer · 2020-10-01T05:33:31.940

1

You could use case_when:

library(dplyr)
#> 
#> Attachement du package : 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

c.1 <- function(n) case_when( n > 0.05 ~ 'ns',
                                n > 0.01 ~ '*',
                                n > 0.001 ~ '**',
                                n > 0.0001 ~ '***',
                                n >=0 ~ '****',
                                is.na(n) ~ 'missing')

set.seed(1)
n <- rgeom(10,.1)
n <- n / max(n) / 100
n[sample(1:10,2)]<-NA 
n
#>  [1] 0.0025000000 0.0012500000 0.0095833333 0.0012500000 0.0100000000
#>  [6]           NA 0.0062500000 0.0008333333 0.0083333333           NA
c.1(n)
#>  [1] "**"      "**"      "**"      "**"      "**"      "missing" "**"     
#>  [8] "***"     "**"      "missing"

df <- data.frame(n)

df %>% mutate(signif = c.1(n)) %>%
       select(signif,n) %>%
       group_by(signif) %>%
       summarize(nb = n()) %>%
       ungroup() 
#> `summarise()` ungrouping output (override with `.groups` argument)
#> # A tibble: 3 x 2
#>   signif     nb
#>   <chr>   <int>
#> 1 **          7
#> 2 ***         1
#> 3 missing     2

^{Created on 2020-10-01 by the reprex package (v0.3.0)}

edited Oct 01 '20 at 05:33

answered Oct 01 '20 at 05:17

Waldi

39,242
6
30
78

Thank you @Waldi ! I tried it `c.1 <- function(n) case_when( n > 0.05 ~ 'ns', n > 0.01 ~ '*', n > 0.001 ~ '**', n > 0.0001 ~ '***', n >=0 ~ '****', is.na(n) ~ 'missing') tab.1<-table(c.1(nav$pvalue)) apply(tab.1, 2, sum)' Error in if (d2 == 0L) {: missing value where TRUE/FALSE needed Traceback: 1. apply(tab.1, 2, sum)` – user432797 Oct 01 '20 at 05:29
1

see my edit for a perhaps simpler solution with dplyr – Waldi Oct 01 '20 at 05:34
I like your first solution @Waldi, but I need to overcome the error, it keeps showing:`Error in if (d2 == 0L) {: missing value where TRUE/FALSE needed Traceback: 1. apply(tab.1, 2, sum)` – user432797 Oct 01 '20 at 05:42
I tried to reverse the missing to NA, because the missing data is showing NA in the pvalue column: `c.1 <- function(n) case_when( n > 0.05 ~ 'ns', n > 0.01 ~ '*', n > 0.001 ~ '**', n > 0.0001 ~ '***', n >=0 ~ '****', is.na(n) ~ 'NA') tab.1<-table(c.1(nav$pvalue)) apply(tab.1, 2, sum)` but still no luck, the same error. – user432797 Oct 01 '20 at 05:44
1

could you provide `dput(head(nav$pvalue))`? – Waldi Oct 01 '20 at 05:47
c(0.001772662, 0.917041262, 0.825266165, 0.163495324, 1.07e-08, 1.44e-18) @Waldi – user432797 Oct 01 '20 at 05:48
1

tab.1 calculation works and is a table with the number occurence per category. What is your intention with the last apply? – Waldi Oct 01 '20 at 05:53
I was trying to loop through each row to find the label(stars or NA) @Waldi, now I see that my apply is ruining the output. – user432797 Oct 01 '20 at 05:58
I removed the lapply and it worked like charm @Waldi – user432797 Oct 01 '20 at 05:58
If I wanted to use lapply what would be the approach? @Waldi – user432797 Oct 01 '20 at 05:59

Render NA count in dataframe

1 Answers1