0

I want to get the count of different labels in one column in a dataframe, I'm fine witt the output, I used:

c.1 <- function(n) case_when( n > 0.05 ~ 'ns',
                                n > 0.01 ~ '*',
                                n > 0.001 ~ '**',
                                n > 0.0001 ~ '***',
                                n >=0 ~ '****',
                                is.na(n) ~ 'missing')
p.type1<-c.1(nav$pvalue)
tab.1<-table(c.1(nav$pvalue))
nav<-as.factor(tab.1)
nav

The output:

*: 12 **: 23 ***: 44 ****: 76 missing: 5 n.s.: 109

First row of my data input:

        n.name    bMean    log2FoldChange    lfcSE        stat            pvalue         padj
        <fct>      <dbl>      <dbl>           <dbl>         <dbl>         <dbl>       <dbl>
469    TNFRSF1B  542.82545  -3.406411        0.2267235    -15.024517    5.07e-51    3.25e-48


Is there away to get the same results using lapply?
user432797
  • 593
  • 4
  • 13
  • 3
    It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input that can be used to test and verify possible solutions. Here `nav` seems to be missing. Why do you want to avoid using `table()` when it seems to return the results you want? Seems odd to require use of a function that doesn't seem to be necessary – MrFlick Oct 01 '20 at 19:49
  • 1
    Sure, you can slow it down and make it inefficient by converting a single vectorized call to `c.1` and turn it into `length(somelst)` calls to the same function with a single value each call. But why would you? In addition to asking ([again](https://stackoverflow.com/a/63380754/3358272)) for reproducible data (perhaps `dput(head(nav))`), it might help to give some context on why something like `cut(n, c(0,0.0001,0.001,0.01,0.05),labels=...)` doesn't suffice. – r2evans Oct 01 '20 at 19:57
  • @MrFlick I update the question with the first row of data input. And I totally agree with you, table is the easiest way I've tried so far, but I need to know how to do it with lapply, I'm getting error when I loop it with lapply. – user432797 Oct 01 '20 at 21:31
  • @r2evans That's exactly what I'm thinking, it will definitely slow it down!...I appreciate that you gave my question a look, I agree lapply is a bothersome compared to table, but I need to learn and I'm new. I appreciate your input, can you demonstrate with syntax? it swill be great deal of help. – user432797 Oct 01 '20 at 21:35
  • 1
    `lapply(nav$pvalue, c.1)`? – r2evans Oct 01 '20 at 21:36
  • @r2evans so if I use: 'c.1 <- function(n) case_when( n > 0.05 ~ 'ns', n > 0.01 ~ '*', n > 0.001 ~ '**', n > 0.0001 ~ '***', n >=0 ~ '****', is.na(n) ~ 'missing') tab.1a<-lapply(nav$pvalue, c.1) ' I get this error `Error in nav$pvalue: $ operator is invalid for atomic vectors Traceback: 1. lapply(nav$pvalue, c.1)` – user432797 Oct 02 '20 at 00:44
  • ***What is `nav`?*** That error has to do with your data, not with the function. Just like we asked in your [previous question](https://stackoverflow.com/questions/64130346/counting-different-occurrences-in-r-for-special-character) and in comments above, please make your question reproducible by including the output from `dput(head(nav))` (or just `dput(head(nav$pvalue))`, if that even works). – r2evans Oct 02 '20 at 01:47
  • @r2evansThanks for the tip, it is my mistake, I changed the nav to original data frame, I tried this 'c.1 <- function(n) case_when( n > 0.05 ~ 'ns', n > 0.01 ~ '*', n > 0.001 ~ '**', n > 0.0001 ~ '***', n >=0 ~ '****', is.na(n) ~ 'missing') head(nav.2<-lapply(nav$pvalue, c.1)) dput(head(nav$pvalue)), and I got this `'**' 'ns' 'ns' 'ns' '****' '****' c(0.001772662, 0.917041262, 0.825266165, 0.163495324, 1.07e-08, 1.44e-18) – user432797 Oct 02 '20 at 02:03
  • @r2evans it is not giving me the count of each label like in table function! – user432797 Oct 02 '20 at 02:04
  • (1) You asked for a way to use your `c.1` function in `lapply`. It does that. (2) Because it returns a `list`, it will not work with `table`. So ... ummm ... `table(sapply(nav$pvalue, c.1))`? – r2evans Oct 02 '20 at 02:07
  • @r2evans you are great deal of help, I appreciate your support, I don't want use table function in this example, I'm trying to avoid table function and use lapply to return the same output. – user432797 Oct 02 '20 at 02:39

1 Answers1

1

So you don't want to use table? lapply isn't a perfect replacement, and this is inefficient in that it is making a lot of comparisons that really don't need to be made. But try this:

nav <- list(pvalue=c(0.001772662, 0.917041262, 0.825266165, 0.163495324, 1.07e-08, 1.44e-18))
c.1(nav$pvalue)
# [1] "**"   "ns"   "ns"   "ns"   "****" "****"
tmp <- c.1(nav$pvalue)
lapply(setNames(nm=c('ns','*','**','***','****','missing')), function(a) sum(tmp == a))
# $ns
# [1] 3
# $`*`
# [1] 0
# $`**`
# [1] 1
# $`***`
# [1] 0
# $`****`
# [1] 2
# $missing
# [1] 0

If you don't like the elongated list representation, consider sapply:

sapply(setNames(nm=c('ns','*','**','***','****','missing')), function(a) sum(tmp == a))
#      ns       *      **     ***    **** missing 
#       3       0       1       0       2       0 

Some things I'd suggest: whereever you define c.1 with all of those labels in it, also define the vector of these stars, so that you don't risk mis-typing (or missing) any of them.

r2evans
  • 141,215
  • 6
  • 77
  • 149
  • It worked, there is a testthat function installed, and I'm getting this error: 'Error: Test failed: 'test missing p-value' * :2: `x` inherits from `character` not `logical`. Traceback: 1. test_that("test missing p-value", { . expect_that(pType(NA), is_a("logical")) . }) 2. test_code(desc, code, env = parent.frame()) 3. get_reporter()$end_test(context = get_reporter()$.context, test = test) 4. stop(message, call. = FALSE)' what does that mean and how to resolve it? – user432797 Oct 02 '20 at 03:33
  • 1
    I suggest you write a new reproducible question asking about `testthat` errors. Be sure to include the (package) code involved, the (testthat) code that fails, and this error stacktrace. – r2evans Oct 02 '20 at 03:35