Count specific value across multiple columns and divide total values per column with excluding '0'

Question

I am working data set consisting of multiple columns with 0,1,2 value. I am trying to compute the number of times either 1 or 2 occurrences across multiple columns and finally would like to divide the total number of times for both 1 and 2.

Below is the data subset:

input

enter image description here

expected result

enter image description here

sorry for inconvenience! as i was new to this

  df <- structure(list(Pool = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L,2L, 2L, 2L, 2L, 2L), a = c(2L, 2L, 2L, 2L, 2L, 2L, 2L,2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L), b = c(2L, 2L, 2L,2L, 2L, 2L, 2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), c = c(0L,0L, 2L, 0L, 1L, 2L, 
    0L, 2L, 2L, 0L, 2L, 2L, 2L, 2L, 2L), d = c(0L,0L, 2L, 0L, 2L, 
    2L, 0L, 2L, 2L, 0L, 2L, 2L, 2L, 2L, 2L), e = c(2L,2L, 2L, 0L, 
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 0L, 2L, 2L), f = c(2L,2L, 2L, 
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 0L, 2L, 2L), g = c(2L,1L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), h =c(2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
    2L)),row.names = c(NA,15L), class = "data.frame")

If anyone could help for solution, Thank you for your time

Welcome to Stack Overflow. We cannot read data into R from images. Please [make this question reproducible](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) by including a small representative dataset in a plain text format - for example the output from `dput(input)`. — neilfws, Aug 17 '22 at 01:43
#neilfws, how do I attach txt format? could you please suggest me? — Gopi, Aug 17 '22 at 02:07
You could copy/paste the output of `dput(input)` to your question, so that we can use your data. — Darren Tsai, Aug 17 '22 at 03:14
Thank you so much #Darren Tsai, i tried this time. But i dont have 'L' after value i am not sure how it appeared in dput output — Gopi, Aug 17 '22 at 11:25
`L` means integer values. You can run `class(1)` and `class(1L)` to see the difference. In addition, the dput output is inconsistent with the data in the image. You should check before you post it. — Darren Tsai, Aug 17 '22 at 11:40

jpsmith · Answer 1 · 2022-08-17T05:17:13.773

You could try this base R approach, though I am sure there are more elegant solutions. From your desired output, it looks like you want to calculate the proportion of 2's out of all 1's and 2's.

Data

Reproducible example data, which from your screenshot seems to be heavily favored for a value of 2

set.seed(123)
df <- data.frame(ID = rep(1:3, each = 5),
                 matrix(sample(0:2, 120, replace = TRUE, prob = c(20/120, 8/120, 92/120)), ncol = 8))
colnames(df)[-1] <- letters[1:8]colnames(df)[-1] <- letters[1:8]

head(df, 5)
#   ID a b c d e f g h
# 1  1 2 0 1 2 2 2 2 0
# 2  1 0 2 0 2 2 2 2 0
# 3  1 2 2 2 2 2 2 2 2
# 4  1 0 2 0 2 2 2 2 2
# 5  1 1 1 2 0 0 2 2 2

Note the numbers differ from your question since the data were provided as a screenshot and this code created data at random.

Code

Splits the data frame by group then applies a function to tabulate and divide the data (a) then recombine everything into a final data frame (b). Also checks to see if there are any values of 2 at all, else returns NA.

a <- lapply(split(df[, -1], df$ID), function(x)
  lapply(x, function(y) {
    t <- data.frame(table(y))
    if (any(t$y == 2)) {
      round(t[t$y == 2, "Freq"] / sum(t[t$y %in% 1:2, "Freq"]), 1)
    }
    else {
      NA
    }
  }))

b <- data.frame(ID = unique(df$ID), do.call(rbind, a))

Output:

# > b
#   ID    a    b    c d e    f   g   h
# 1  1 0.67 0.75 0.67 1 1    1   1   1
# 2  2    1 0.75    1 1 1    1   1 0.6
# 3  3  0.8    1    1 1 1 0.67 0.8 0.8

Thank you so much for #jpsmith, your wonderful help, two things I would like to explore in the above code. what is x and y functions are doing? my output came as 'dbl' instead of integer, i got error! while I convert to integer (as.integer (b)) [Error: 'list' object cannot be coerced to type 'integer'] — Gopi, Aug 17 '22 at 05:56

score 0 · Accepted Answer · answered Aug 17 '22 at 07:48

0

You could try

library(dplyr)

df %>%
  group_by(id) %>%
  summarise(across(everything(), ~ mean(.x[.x != 0] == 2)))

answered Aug 17 '22 at 07:48

Darren Tsai

32,117
5
21
51

this answer is straight as an arrow! awesome # Darren Tsai. – Gopi Aug 17 '22 at 11:25

Count specific value across multiple columns and divide total values per column with excluding '0'

2 Answers2

Data

Code

Output: