0

Here is my dataframe:

df <- data.frame(a = c(1:10),
                 b= c(11:15, NA, NaN, '', 20, 22))

a   b
1   11          
2   12          
3   13          
4   14          
5   15          
6   NA          
7   NaN         
8               
9   20          
10  22

what I need to do is to extract rows where the value in column b is not a number. In this case, I need to extract rows where column a is 7,8,9. I definitely need a general solution that work for any large dataset.
I tried:

df %>% filter(!is.numeric(b))

But it does not work. I do not have any clue how to achieve that. thanks in advance for any help.

zesla
  • 11,155
  • 16
  • 82
  • 147
  • 2
    Related: https://stackoverflow.com/questions/24129124/how-to-determine-if-a-character-vector-is-a-valid-numeric-or-integer-vector – MrFlick Jan 25 '18 at 18:38
  • 1
    row no. 9 is 20, which is a number. – akrun Jan 25 '18 at 18:38
  • 2
    Related: https://stackoverflow.com/questions/13638377/test-for-numeric-elements-in-a-character-string – MrFlick Jan 25 '18 at 18:39
  • 2
    Related: https://stackoverflow.com/questions/21196106/finding-non-numeric-data-in-an-r-data-frame-or-vector – MrFlick Jan 25 '18 at 18:40
  • When you say "extract," do you mean filter out/delete those rows, or keep only those rows, or create a separate table with those rows, or what? – ulfelder Jan 25 '18 at 18:41
  • I meant keep only those rows. But I think it's great to know either way. The related link from @MrFlick is great. But I would like to know how to do it in dplyr... – zesla Jan 25 '18 at 18:44
  • To make it "dplyr" -- just put those expressions in your `filter()`. dplyr is not that different from "base" R. I think all those existing questions answer this question just fine. If you have a problem using those solutions, you should demonstrate the problem more clearly. – MrFlick Jan 25 '18 at 18:49
  • @MrFlick what I want to know is if there is any more concise way to do it in dplyr. According to the answer in the link, it needs quite long expression. If there is no better way to do it, that is fine. thanks – zesla Jan 25 '18 at 18:58

2 Answers2

1

considering data as :

df <- data.frame(a = c(1:10),
                 b= c(11:15, NA, NaN, '', 20, 22))

the first issue I can see is that b is read in as factors, which can be checked by doing :

str(df)

giving us

'data.frame':   10 obs. of  2 variables:
 $ a: int  1 2 3 4 5 6 7 8 9 10
 $ b: Factor w/ 9 levels "","11","12","13",..: 2 3 4 5 6 NA 9 1 7 8

with this in mind, we can just tweak your existing approach to something like

df %>% 
  mutate( b = as.numeric(as.character(b))) %>%
  filter(is.nan(b) | is.na(b)) 

which gives us:

  a   b
1 6  NA
2 7 NaN
3 8  NA
Aramis7d
  • 2,444
  • 19
  • 25
0

This will leave only the rows that have numbers:

Base R:

new <- df[!is.na(as.numeric(as.character(df$b))),]

if you start at the furthest inward parentheses, it converts everything in column B to character, and then converts that to numeric. If a non-number is tried to convert to numeric, it is replaced with NA. The final piece checks if the string is an NA or not, and if it is, it filters it out. This is all base R.

leeum
  • 264
  • 1
  • 13