0

I would like to use %>% to pass a data through colSums. In fact, this should apply to all the calculations.

Here is my example:

I can use following codes to reach my goal:

result<- colSums(!is.na(df[ , c("A", "B", "C","D", "RT", "PR", "OTH")]), na.rm = TRUE)

how can I rewrite my codes to sth that looks like this:

result <- df[ , c("A", "B", "C","D", "RT", "PR", "OTH")] %>%
colSums(!is.na(), na.rm = TRUE)

These codes did not work. And I got error codes Error in is.na() : 0 arguments passed to 'is.na' which requires 1. Could anyone give me some guidance?

Thanks

Update:

Sample data:

df<-structure(list(A = c("A", NA, NA, NA, NA, NA, NA, NA), B = c(NA, 
NA, "B", NA, NA, NA, NA, NA), C = c(NA, "C", NA, NA, NA, NA, 
NA, NA), D = c(NA, NA, NA, "D", "D", NA, NA, NA), RT = c(NA, 
"RT", NA, NA, NA, NA, "RT", NA), PR = c(NA, NA, "PR", NA, NA, 
NA, NA, NA), OTH = c(NA, NA, NA, NA, "OTH", NA, NA, "OTH")), row.names = c(NA, 
-8L), class = c("tbl_df", "tbl", "data.frame"))
Stataq
  • 2,237
  • 6
  • 14
  • 1
    You already have an answer, but for more general cases, these posts may be useful: [Using the %>% pipe, and dot (.) notation](https://stackoverflow.com/questions/42385010/using-the-pipe-and-dot-notation), [What does the dplyr period character “.” reference?](https://stackoverflow.com/questions/35272457/what-does-the-dplyr-period-character-reference) – Henrik Mar 31 '21 at 15:01

3 Answers3

3

What the pipe does is put what comes before the pipe as the first argument of what comes after, so

# What the pipe does
## with pipe
x %>% foo(other_arg)
## equivalent to this:
foo(x, other_arg)

## your version piped:
df[ , c("A", "B", "C","D", "RT", "PR", "OTH")] %>%
  colSums(!is.na(), na.rm = TRUE)

## is interpreted like this:
colSums(df[ , c("A", "B", "C","D", "RT", "PR", "OTH")], !is.na(), na.rm = TRUE)

Hopefully the above makes sense, and you can see why you get an error about is.na() needing an argument.

You can use the pipe, but as you note the ! takes special handling. ! as a prefix has higher precedence than %>%, so R will try to evaluate the ! result before piping into it. To work around this, we can call ! explicitly as a function, rather than a prefix operator. Alternately, if you load the magrittr package (the original source of %>%), it provides aliases for cases like this, including the not() function which is an alias for !. These are demonstrated below:

df[ , c("A", "B", "C","D", "RT", "PR", "OTH")] %>%
  is.na() %>%
  `!`() %>%
  colSums(na.rm = TRUE)

library(magrittr)
df[ , c("A", "B", "C","D", "RT", "PR", "OTH")] %>%
  is.na() %>%
  not() %>%
  colSums(na.rm = TRUE)
Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
1

I have updated my code in the following way. I don't know why when I negate is.na I just can't get the desired result with a pipe

colSums(!is.na(df[ , c("A", "B", "C","D", "RT", "PR", "OTH")]))

  A   B   C   D  RT  PR OTH 
  1   1   1   2   2   1   2 

Only in this way you can count those values which are not NA. If you want to stick to base R.

Anoushiravan R
  • 21,622
  • 3
  • 18
  • 41
  • Just bear in mind that when you pass a data into another function, the first argument of that function should be a data frame or a vector. – Anoushiravan R Mar 31 '21 at 14:56
  • Thanks for the answer. The reason I used `!is.na()` is count the number of cases with answer for `a,b,c ,d, etc`. They are character variables instead of numbers. Any way I can keep `!is.na()`? – Stataq Mar 31 '21 at 16:15
  • Your welcome. Not in the `colSums` but you can use it in your result vector. – Anoushiravan R Mar 31 '21 at 16:18
  • I just updated the post with sample data. Thanks. – Stataq Mar 31 '21 at 16:21
  • I updated my code. It's a kind of weird case with `is.na` we can use pipe but if I negate it the result would not be the one you like. – Anoushiravan R Mar 31 '21 at 16:41
0

dplyr style would be

result <- df[ , c("A", "B", "C","D", "RT", "PR", "OTH")] %>% mutate(across(everything(), ~colSums(!is.na(.), na.rm = TRUE))) 
AnilGoyal
  • 25,297
  • 4
  • 27
  • 45
  • I got error codes` Error: Problem with `mutate()` input `..1`. x 'x' must be an array of at least two dimensions i Input `..1` is `across(everything(), ~colSums(!is.na(.), na.rm = TRUE))`.` – Stataq Mar 31 '21 at 17:09