0

Example of some entries in the data frame:

enter image description here

I need to find the mean of this column in the data frame, but can't find the mean as it says:

" argument is not numeric or logical: returning NA"

The non-numeric entries are dash signs, I have tried converting them to NA but still am struggling to produce a result for the mean.

Can anyone help?

Z.Lin
  • 28,055
  • 6
  • 54
  • 94
rubz22
  • 11
  • 1
  • 3
  • Could you please provide an example that can be reproduced by us in R wo we do not have to create it from scratch based on your screenshot? – deca Sep 14 '17 at 06:24
  • 1
    Please post a data example in another way, for instance, post the output of `dput(head(df))`, where `df`is the name of your data frame. Also, if you are reading the data from file using the `read.table` family of functions, take a look at argument `na.strings`. – Rui Barradas Sep 14 '17 at 06:26

4 Answers4

4

Try this, assuming your data is called dat:

dat[dat == "-"] <- NA

mean(dat$Population_and_People, na.rm = TRUE]
deca
  • 730
  • 1
  • 8
  • 24
  • thank you for your help! it successfully made all the dashes NA, I have one other entry in line 6 that is words - I originally thought I could just find the mean by using the tail function to not include it... but its not working this is what I did - do you have any idea mean(tail(death_rate_col, -6), death_rate_col, na.rm = TRUE ) (I named my data frame of this column death_rate_col) – rubz22 Sep 14 '17 at 08:56
  • Update: I actually have 2 rows at the beginning that are words, the rest are numbers or NA's, I used your code for one of the words and it made it NA (and I think disappear) but it is not working for the other entry which is "Standardised death rate (per 1,000 population)" - the one that did work was simply just "rate" any idea why the Standardised one will not go to NA? I am still getting this response on R: [1] NA Warning message: In mean.default(death_rate_col, na.rm = TRUE) : argument is not numeric or logical: returning NA – rubz22 Sep 14 '17 at 09:07
  • Sorry I am not sure I understand your problem. So what you are saying is that you have `characters` in your vector other than `-` ? What happens if you try the following: `dat$Population_and_People <- as.numeric(dat$Population_and_People)` This will reduce all your values in the vector to numerics. for `-` and other `characters`, R will transform them into `NA`, because it cannot transform them into `numerics`. Then you should be able to call `mean(dat$Population_and_People, na.rm = TRUE)` – deca Sep 14 '17 at 09:24
  • thanks I will try this - just because my names are different to your I just wanted to clarify: what is the difference between `dat` and `Population_and_People` I believe you said `dat` was the name of my data as in the name of my data frame? and then what is Population_and_People` referring to ? – rubz22 Sep 14 '17 at 09:30
  • disregard above ^ I woken out that `dat`was my data frame and `population and people` was my column name of the data - I did as you said but weirdly it completely changed the values of my entries, one that was 6.2 now says 41, the entry that was characters turned to 78? the mean will be completely wrong now, this was my exact code: `death_rate_col$Population_and_People.X__76<- as.numeric(death_rate_col$Population_and_People.X__76)` and then the values changed, so my mean `mean(death_rate_col$Population_and_People.X__76, na.rm = TRUE)` is completely off `[1] 38.94643` – rubz22 Sep 14 '17 at 09:45
  • It sounds to me that your values were specified as factors and by using `as.numeric()` R converted them to their factor levels. So please try to apply the following to your original column: `dat$Population_and_People <- as.numeric(as.character(dat$Population_and_People ))` Then apply `mean(dat$Population_and_People, na.rm = TRUE)` Please let me know if it works and if so mark my answer as accepted. – deca Sep 14 '17 at 09:59
2

This isn't using the supplied data but should be enough to show the desired result. Note this is related to How to avoid warning when introducing NAs by coercion

x <- c("5", "-", "15")
mean(suppressWarnings(as.numeric(as.character(x))), na.rm = TRUE)
#> [1] 10
markdly
  • 4,394
  • 2
  • 19
  • 27
1

Yet another way.

is.na(dat$Population_and_People.X__76) <- dat$Population_and_People.X__76 == "-"

Followed by mean with na.rm = TRUE).

EDIT
Note that your column is probably of class factor. A vetcor can only have one type of data if it has a character such as "-", the entire column will be transformed to class characterin the first step and then to factor. This last step is the default behaviour, you must set stringsAsFactors = FALSE in order for it not to happen. The (not so) pratical result is that you cannot use mean on that column. You will most probably need to do

dat$Population_and_People.X__76 <- as.numeric(as.character(dat$Population_and_People.X__76))

Before you do this check the class of that column, either with class(dat$Population_and_People.X__76) or with str(dat).

Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
  • Maybe I am misunderstanding the problem, but I think that the poster has no NAs in his data, but "-". So your is.na() query would not return anything to replace by "-". It should be the other way, replacing "-" by NA, or not? – deca Sep 14 '17 at 06:31
  • 2
    @Martin No, you're wrong. Function `is.na` can return `FALSE/TRUE` if a vector has `NA` in it but can also *set* a vector's values to `NA`, when used like I'm using it. – Rui Barradas Sep 14 '17 at 06:33
  • OK thanks, I did not know that! – deca Sep 14 '17 at 06:39
0

Try this:

dataset$Population_and_People.X_76 <- gsub("-", NA, dataset$Population_and_People.X_76], fixed=TRUE) dataset$Population_and_People.X_76 <- as.numeric(dataset$Population_and_People.X_76) mean(dataset$Population_and_People.X_76, na.rm=TRUE)

This will not account for treated records(hyphens) in the denominator while calculating mean.

Vash
  • 1,767
  • 2
  • 12
  • 19