-2

I have a column in a dataframe and I am trying to find the mean. I used:

mean(dat$Age, na.rm=TRUE)

and got an error that the numeric or logical operator wrong. Realizing there was a non numeric value, I fixed it using:

dat[10, 2] #- value in Age column on row 10

I tried it again and still got the numeric error.

Edit: I need to define the function as a new variable.

Jack Armstrong
  • 1,182
  • 4
  • 26
  • 59
  • Please provide a reproducible example - [please see this post](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – MHammer Jul 06 '18 at 02:23
  • If it is a nonumeric, then after assigning you need `as.numeric(dat$Age)` BTW, if there is non-numeric variable, you don't need to assign to any value, by default `as.numeric` gets `NA` for that i.e. `mean(as.numeric(dat$Age), na.rm = TRUE)` – akrun Jul 06 '18 at 02:23

3 Answers3

2

You may try casting your input vector to numeric before taking the mean. This will result in non numeric values getting assigned NA, which you may then remove in your call to mean using na.rm=TRUE:

x <- c(1, 'Hello', 3)
mean(as.numeric(x), na.rm=TRUE)

[1] 2

This will generate a warning message, but at least it will run.

As a general comment, it is best practice to not mix numeric and non numeric data in the same vector, column, etc.

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • is there a way I can return the non-numeric values to identify where the problems come from? – Jack Armstrong Jul 06 '18 at 02:26
  • @JackArmstrong You may use `grepl` in this case, see Arun's answer below. But, I think that `as.numeric` will do a better job of flagging non numeric data than a regex which we write. – Tim Biegeleisen Jul 06 '18 at 02:37
2

By doing the assignment to a value, it just replaces the value corresponding to it. But, it won't change the column type. We need

dat$Age <- as.numeric(dat$Age)

Also, as commented above, by doing as.numeric(dat$Age) directly on a column with non-numeric elements, there will be a warning that fills NA for non-numeric elements and it can be identified with is.na

i1 <- is.na(dat$Age)

Another approach without warning to identify the rows that are non-numeric would be with grepl. Using the pattern to match either negative (-) or other numbers including decimal from start (^) to end ($) would cover most cases.

i1 <- !grepl("^-?[0-9.]+$", dat$Age)

mean(dat$Age, na.rm = TRUE)
akrun
  • 874,273
  • 37
  • 540
  • 662
  • So I did as.numeric and looks like it cleared up the issue. But when it prints, it shows a bunch of numbers that are not the ones in the dataset. – Jack Armstrong Jul 06 '18 at 02:31
  • 1
    @JackArmstrong That is because your column is `factor`. You need `as.numeric(as.character(dat$Age))` – akrun Jul 06 '18 at 02:32
  • 1
    @JackArmstrong By default `read.csv/read.table` reads data with non-numeric columns as `factor` class unless you specified in the `colClasses` or used `stringAsFactors = FALSE`. If you have created data with `data.frame`, default option is `stringsAsFactors = TRUE`\ – akrun Jul 06 '18 at 02:35
  • 1
    so even though you change the value in a cell it defults as non numeric column. That makes sense. I had another numeric column and it worked fine before, but very confusing for a beginner. – Jack Armstrong Jul 06 '18 at 02:36
  • @JackArmstrong I think you may need to use `read_csv` or `read_table` from `readr` or `fread` from `data.table` which is by default `stringsAsFactors = FALSE` – akrun Jul 06 '18 at 02:40
  • @JackArmstrong I make a point to look at the structure of the dataset before doing anything else. Because, it gives info about all these problems and avoid spending more time. – akrun Jul 06 '18 at 02:43
  • @JackArmstrong Yes, I think so. If you need to change the default behavior, change it tin `col_types` – akrun Jul 06 '18 at 02:45
  • @JackArmstrong Check the output `read_csv("x,y,z\n1,2,a\n3,4,b")` and the factor type abbreviation is not mentioned in the documentation, but you can specify it with `col_factor` – akrun Jul 06 '18 at 02:47
0

How about this?

  x = c(1,2,3,"xxx");
  grepl("[[:digit:]]", x)
MSW Data
  • 441
  • 3
  • 8