0

Is there a simple way to remove all non numeric values from a vector in r? Suppose we have:

vec <- c(1, 2, T, 'x', 'abc', '6', 7, F, F, 10)

I would like to receive:

c(1, 2, 7, 10)
heisenberg7584
  • 563
  • 2
  • 10
  • 30
  • 2
    Once you create `vec`, there are no numbers in it, all are strings. – r2evans Nov 14 '19 at 20:01
  • 1
    Why not `6`? If you want that, too, then `as.numeric(grep("^-?[0-9.]+$", c(1, 2, T, 'x', 'abc', '6', 7, F, F, 10), value=TRUE))`. – r2evans Nov 14 '19 at 20:01
  • 4
    If `6` is truly not wanted, then ... you cannot, period. By using `c(...)`, all elements within the vector are converted to the highest common class, which in this case is `character`, so there is no way to differentiate between what was originally `"6"` and what was originally `6`. If you truly want that, use a `list`. – r2evans Nov 14 '19 at 20:04

4 Answers4

2

You can use regex to look for elements that only contain the digits 0-9 along with periods and return them. The ^ matches the start of a character and $ matches the end so it will filter out any element which has both letters and numbers.

as.numeric(grep('^-?[0-9.]+$', vec, val = T))
ClancyStats
  • 1,219
  • 6
  • 12
  • 1
    We think alike :-). BTW, you don't need to escape the `.` it doesn't mean "any character" inside the bracket-group. – r2evans Nov 14 '19 at 20:24
  • 1
    Huh, nice. I guess that makes sense - a wildcard inside the brackets would kinda defeat their purpose. – ClancyStats Nov 14 '19 at 20:36
  • It's one reason why some R programmers prefer `[.]` over `\\.` ... many (including me) find the double-backslash thing harder to read than the bracket notation. And R's preference towards double-backslashes (vice singles) means it is no better/worse in code-golf. – r2evans Nov 14 '19 at 20:42
2

c is a function that returns a vector where "all arguments are coerced to a common type...The output type is determined from the highest type of the components in the hierarchy NULL < raw < logical < integer < double < complex < character < list < expression."

Thus, you need a container where you can have mixed data types to test which ones are numeric. A list gives you this. This approach leaves out the '6':

vec_list <- list(1, 2, T, 'x', 'abc', '6', 7, F, F, 10)
unlist(vec_list[sapply(vec_list, function(x) if(class(x)=='numeric') {T} else {F})])

[1] 1 2 7 10

ThetaFC
  • 660
  • 3
  • 9
  • 2
    It's hard to read all the code wrapped around in one line like this. Why not save the list to a variable first, so you don't have to declare it twice like this? Also, switching to a list works if the OP is able to create their data from the start, but what about cases where this is data they're reading in from some other source? – camille Nov 14 '19 at 22:50
  • Good suggestion for readability @camille, so I did that. For your other question, see https://stackoverflow.com/questions/35823093/read-data-as-a-list-in-r for possible help – ThetaFC Nov 15 '19 at 00:47
1

A simple solution is to use Filter over vec <- list(1, 2, T, 'x', 'abc', '6', 7, F, F, 10), i.e.,

> unlist(Filter(is.numeric,vec))
[1]  1  2  7 10
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81
0

Techically the term vector includes lists that do not have attributes other than names, so here is a vector built with list rather than with c.

vec <- list(1, 2, T, 'x', 'abc', '6', 7, F, F, 10)

So that can be tested for "numericy"

vec[sapply(vec, is.numeric)]
[[1]]
[1] 1

[[2]]
[1] 2

[[3]]
[1] 7

[[4]]
[1] 10
IRTFM
  • 258,963
  • 21
  • 364
  • 487