1

I'm trying to calculate the mean of a column of a dataframe using some restrictions so first I did:

mean(Ykkonen$deltaA[Ykkonen$PH<=2.5], na.rm = TRUE)

but I when I try instead

Ykkonen %>% filter(PH<=2.5) %>% mean(deltaA, na.rm = TRUE)

I get error

[1] NA
Warning message:
In mean.default(., deltaA) :
argument is not numeric or logical: returning NA

Yet deltaA is numerical. So I am trying to understand why using the tubes %>% is any different?

I mean if I understand it correctly by typing dataframe %>% filter(a=='s') it should return only entries that has s for the variable a , am I right?

loki
  • 9,816
  • 7
  • 56
  • 82
bm1125
  • 223
  • 1
  • 10
  • 1
    please try to come up with a [reproducible example](https://stackoverflow.com/q/5963269/3250126). Thus, can help you better. – loki Aug 25 '18 at 09:59

2 Answers2

1

You need to use a summarise function to get the result you want.

Ykkonen %>% filter(PH<=2.5) %>% summarise(mean = mean(deltaA, na.rm = TRUE)

You should check what you get returned when you use Ykkonen %>% filter(PH<=2.5). This will be a data.frame (tibble), not a vector. So when you use mean after the filter you are giving a data.frame to the mean, which will result in this error. One of the checks mean performs is :

if (!is.numeric(x) && !is.complex(x) && !is.logical(x)) {
warning("argument is not numeric or logical: returning NA") 

And a data.frame will not pass this test.

phiver
  • 23,048
  • 14
  • 44
  • 56
1

If you want to return one value as a numeric vector (with length 1) you can use pull:

Ykkonen %>% filter(PH<=2.5) %>% pull(deltaA) %>% mean(na.rm = TRUE)

Here is a reproducible example:

library(dplyr)

mtcars %>% filter(qsec >= 17) %>% pull(drat) %>% mean(na.rm = TRUE)
# [1] 3.561304

In order to ensure that the target is numeric, you could also use summarize_if like that:

mtcars %>% filter(qsec >= 17) %>% summarize_if(is.numeric, mean) %>% pull(drat)
loki
  • 9,816
  • 7
  • 56
  • 82