0

When using mean(), sd(), etc. functions with a dataframe, I'm getting an 'argument is not numeric or logical' error.

I created a simple frame from two vectors to test functionality (i.e. to use a stat function with a data frame).

str() gives the following:

'data.frame':   195 obs. of  2 variables:
 $ Births  : num  10.2 35.3 46 12.9 11 ...
 $ Internet: num  78.9 5.9 19.1 57.2 88 ...

Using the mean() function:

mean(frame2, na.rm=TRUE)

Gives:

Warning message: In mean.default(frame2, na.rm = TRUE) : argument is not numeric or logical: returning NA

I've seen previous advice to not use mean() with a data frame, which is fine, but not the point.

I'm going through the O'Reilly R Cookbook, and it claims you should be able to use mean() and sd() with a dataframe.

However, I can't make it work.

Grant Miller
  • 27,532
  • 16
  • 147
  • 165
eNTROPY
  • 11
  • 1
  • 5
  • Welcome to Stack Overflow! Please take the [tour](https://stackoverflow.com/tour) and read through the [help center](http://stackoverflow.com/help), in particular how to ask. Your best bet here is to do your research, search for related topics on SO, and give it a go. After doing more research and searching, post a [Minimal, Complete, and Verifiable example](https://stackoverflow.com/help/mcve) of your attempt and say specifically where you're stuck, which can help you get better answers. – help-info.de Sep 09 '18 at 09:02
  • 1
    Possible duplicate of [How to get mean, median, and other statistics over entire matrix, array or dataframe?](https://stackoverflow.com/questions/9424311/how-to-get-mean-median-and-other-statistics-over-entire-matrix-array-or-dataf) – pogibas Sep 09 '18 at 09:07
  • When I do this: mean(frame2$Births, na.rm=TRUE) and mean(frame2$Internet, na.rm=TRUE) it does work, but the book suggests mean(frame2) should also work as R would calculate the mean for each column separately. I can't reproduce that – eNTROPY Sep 09 '18 at 09:15
  • You can `apply` the mean function: `apply(frame2, 2, mean)` – divibisan Sep 10 '18 at 18:00

2 Answers2

1

About your problem:

I dont have access to your book, or other learning resource but the best learning tool is R help. So to understand the type of arguments you can do ?mean and it says:

mean(x, trim = 0, na.rm = FALSE, ...)
Arguments

x   An R object. Currently there are methods for numeric/logical vectors and date, date-time and time interval objects. Complex vectors are allowed for trim = 0, only. 

So, as it explain it works the best for vectors, also based on this question, i think your book is a little old. Get your R version, and compare it with book.


It works well for me in this example:

dt<-data.frame(Births =sample(c(1:100),50),
           Internet =sample(c(1:100),50))

str(dt)
mean(dt$Births)

or even if i make the data as num still works:

dt<-data.frame(Births =as.numeric( sample(c(1:100),50)),
           Internet =as.numeric(sample(c(1:100),50)))

str(dt)
mean(dt$Births)

if you wish to pass your dataframe, and get general info in one go you can use summary function:

summary(iris)
Sal-laS
  • 11,016
  • 25
  • 99
  • 169
0

Two options, first works if indeed all columns are numeric, 2nd just summarizes the numeric columns:

dt %>% dplyr::summarise_all(mean)
dt %>% dplyr::summarise_if(is.numeric, mean)


  Births Internet
1  47.86    47.52
Jon Spring
  • 55,165
  • 4
  • 35
  • 53