How to get the mean, and, include mean to a plot for numeric data

Question

I am a novice R user & have reviewed related questions on the site. Although the title of my question has been asked before I am experiencing some additional issues that I am unable to solve.

I was able to successfully use R console to make a boxplot with data I imported via .csv that includes some NA values (using boxplot(Test)), where Test is the name of my data and includes 3 columns of data with labels with 20 data points each.

But when I tried to calculate mean (by using: mean(Test) or try to add it to the successfully made boxplot (by using: abline(v=mean(Test)) it gave me the following warning message:

Warning message:
In mean.default(Test) : argument is not numeric or logical: returning NA

Now when I tried: sapply(Test, mean, na.rm = TRUE), I did get the correct results but they were followed by the warning message:

Warning message:
In mean.default(X[[i]], ...) :
  argument is not numeric or logical: returning NA

All the right calculations comes out when using: Summary (Test) without any warning messages. I am confused as to what the issue is? Any advice would really help please! thank you

UPDATE2: Thank you for the answer below, worked well.

UPDATE1: Thank you to those who helped below - using colMeans(Test, na.rm=TRUE) returns the means for all my columns. However, is it possible to add the mean as a point for each individual column of data into a boxplot of all the data (i.e. all three columns)? Using abline(v=colMeans(Test)) only adds a single line into the whole plot.

Additional Info: When I use: class(Test) it returns:

[1] "data.frame"

Also I checked the type of my data (apologies if I am using any incorrect words) using sapply(Test, mode) and it returns "numeric" for all 3 columns of my data.

Partial Dataset only

a       b       c
0.68    0.68    0.68
0.28    0.28    0.28
0.62    0.62    0.62
0.73    0.73    0.73

For the `mean`, you need to specify the column i.e. `mean(Test[,1])` for getting the mean of first column. If there are multiple columns, `colMeans(Test)` gives the `mean` of each of the columns. Can you show an example data — akrun, Aug 10 '15 at 06:26
You have a `factor` probably. They are `numeric` representations of categories, but you can't do maths on them. E.g. What is the mean of `factor(c("male","female"))` ? Makes no sense. Try `sapply(Test, class)` — thelatemail, Aug 10 '15 at 06:29
@thelatemail It gets the same error with numeric columns `df1 <- data.frame(1:10); mean(df1) #[1] NA Warning message: In mean.default(df1) : argument is not numeric or logical: returning NA` — akrun, Aug 10 '15 at 06:32
@akrun- I have 3 columns of data, and when I tried `colMeans(Test)` it returned: _Error in colMeans(Test) : 'x' must be numeric_. — Ameno, Aug 10 '15 at 06:32
It is because you don't have all the columns as numeric. Some columns might be factor or character class. — akrun, Aug 10 '15 at 06:36
@Rhertel: when trying `mean(as.matrix(Test))` the following is returned: [1] NA Warning message: In mean.default(as.matrix(Test)) : argument is not numeric or logical: returning NA — Ameno, Aug 10 '15 at 06:37
@akrun: But when I tested with sapply(Test, mode) it shows up as numeric under all 3 column labels — Ameno, Aug 10 '15 at 06:37
@thelatemail: When I tried: sapply(Test, class), it returned "numeric" under all three column names. — Ameno, Aug 10 '15 at 06:39
What does `dput(head(Test,1))` show? That will solve this once and for all. — thelatemail, Aug 10 '15 at 06:44
@thelatemail: When I entered dput(head(Test,1)), it returned: structure(list(Xa = 0.68, b = 0.68, c = 0.68), .Names = c("Xa", "b", "c"), row.names = 1L, class = "data.frame") — Ameno, Aug 10 '15 at 06:47
@akrun: I tried colMeans(Test) carefully and it returns the word NA under each column heading (like this): Xa b c NA NA NA — Ameno, Aug 10 '15 at 06:51
Three columns with 20 data points is a small set. Why don't you copy the entire output of `dput(Test)` in your post? — RHertel, Aug 10 '15 at 06:51
As you said, there are some `NA` values in your dataset, therefore its probably better to use `colMeans(Test, na.rm=TRUE)` — Jaap, Aug 10 '15 at 06:59
@Rhertel: Ok I tried dput(Test) & it returned: `structure(list(Xa = c(0.68, 0.28, 0.62, 0.73, 3, 4, 5, 2.3, 2.6, 1.02, 2.33, 87, 0.62, 1.0, 1.48, 1.01, 1.0, 6.01, 1.37), b = c(0.68, .28, 0.62, 0.73, 1.9, 1.5, 0.13, 8.6, 0.12, 1.5, 0.18, 0.18, 0.07, 1.0, 1.0, 0.17, 1.0, 1.0, 0.1), c = c(0.68, 0.28, 0.62, 0.73, 5, 3.2, 1.7, 2.1, 1.9, 0.9, 3.7, 2.3, 0.13, 1.0, 0.32, 0.12, 1.0, 1.0, 1.0)), .Names = c("Xa", "b", "c"), class = "data.frame", row.names = c(NA, -19L))` (Also by the way I added a dummy value in to the empty data space in my initial text to see if that helped) — Ameno, Aug 10 '15 at 07:01
Based on the `dput` , I get `colMeans(Test)# Xa b c 6.423684 1.092632 1.456842`. As @Jaap mentioned, there might be 'NA' values in your full dataset. So, you can use `na.rm=TRUE` — akrun, Aug 10 '15 at 07:03
Thank you. Using your data from the `dput` output I can't reproduce the error. Both `colMeans(Test)` and `mean(as.matrix(Test))` work without producing an error message. But the data contains only 19 points, and not 20 as stated in the OP. — RHertel, Aug 10 '15 at 07:04
@Japp: yes when I tried: colMeans(Test, na.rm=TRUE) it does return results without any warnings. Will this work to add the mean into a boxplot? And if so, is it called separately after making the boxplot(Test)? — Ameno, Aug 10 '15 at 07:05

score 2 · Accepted Answer · answered Aug 10 '15 at 08:12

2

Here is an example of how to create a boxplot out of three numeric variables and add points with the mean for each of them.

#Create example data, including some NA values
set.seed(13121)
test = data.frame(a = c(rnorm(99, 1, 1), NA), 
                  b = c(NA, rnorm(99, 0, 1)), 
                  c = rnorm(100, 2, 2))


#Calculate means for each of the columns
means = colMeans(test)

The result in this case returns NA for the first two columns, because there are NA values in them:

means
#       a        b        c 
#      NA       NA 2.021736

The solution is to add na.rm = TRUE option (see ?colMeans for more information):

means = colMeans(test, na.rm = TRUE)
means
#         a          b          c 
# 0.9843446 -0.1428516  2.0217361

Now we are ready to do the boxplot and add points with calculated means:

boxplot(test)
points(means, col = "red")

Result:

answered Aug 10 '15 at 08:12

hugot

946
6
8

Thank you, this worked! However when I re-did this with longer names (i.e. 14 characters long) & added several variables, the plot either only showed every other name (if horizontal) or cut of some of the letters (if vertical, using `las=2'). The axis title also overlaps with the labels. Although `cex.axis` reduces the font, it is too small for my purpose. Is there a way to keep the entire axis name at the normal size (`cex.axis=1) and not lose some of the letters in the 14 character name? thank you again. – Ameno Aug 11 '15 at 01:36
@Ameno: For the problem of your labels being cut, you can adjust the size of the plot's margins with `par`, before doing the boxplot. For example: `par(mar = c(10, 4, 4, 2))`. Try different numbers to see what's best for you and see `?par` for more information. For the axis title, don't use `xlab` in the boxplot function and instead use `mtext` after drawing the boxplot. For example: `mtext(text = "x title", side = 1, padj = 14)`. Again, play with the `padj` parameter. See [this post](http://stackoverflow.com/questions/10286473/rotating-x-axis-labels-in-r-for-barplot) for more information. – hugot Aug 11 '15 at 08:32
Thank you @hugo, all this helped and I'm trying different things to see what works. Your input has been very helpful, so I wanted to ask if there is a way to control the number of decimals or sig figs on the boxplot axis? I haven't been able to achieve this yet. – Ameno Aug 16 '15 at 22:23

How to get the mean, and, include mean to a plot for numeric data

1 Answers1