1

I have a data.frame with a column that should have continuous data. However, some of the rows have values with '~' and '<' symbols.

c.a <- c(1,5,3,7,4,9,2,3,7)
c.b <- c("a", "c", "f", "s", "r", "q", "w", "e", "t")
c.d <- c(1,4,6, '<5', '~34', 65, 45, 2, 6)
x <- data.frame(c.a, c.b, c.d)

The objective would be to remove rows 4 and 5 from the data.frame x

Hopefully this is not a repeated question, but I have done a quick search and cannot find a solution. Thanks in advance.

4 Answers4

2

You can try converting to numeric and discard those that are non-numeric

x[!is.na(as.numeric(as.character(x$c.d))),]

output:

  c.a c.b c.d
1   1   a   1
2   5   c   4
3   3   f   6
6   9   q  65
7   2   w  45
8   3   e   2
9   7   t   6
chinsoon12
  • 25,005
  • 4
  • 25
  • 35
2

You can use grepl() to filter:

x[grepl(x=as.numeric(x$c.d),"[^\\d]"), ]

Output:

  c.a c.b c.d
1   1   a   1
2   5   c   4
3   3   f   6
4   9   q  65
5   2   w  45
6   3   e   2
7   7   t   6
andrew_reece
  • 20,390
  • 3
  • 33
  • 58
  • I am getting the same as the above comment. The code doesn't seem to be giving me the same output. – Christopher Kavazos May 31 '18 at 03:13
  • You're saying if you copy your example code in your post exactly, and then run the solutions that chinsoon and I gave, you are not getting the output we have posted? If so, I would agree with chinsoon's comment - trying a fresh start of your R environment. Both solutions work as demonstrated on my end, and it's just base R so there shouldn't be any package issues. (If it still doesn't work, can you describe what output you're seeing?) – andrew_reece May 31 '18 at 03:15
  • andrew_reece and @chinsoon12 I appreciate your assistance. Yep. I have tried in R, restarted it, and also tried in RStudio. No packages loaded. My output is exactly the same as 'x' before the line is run. It is not working on my full dataset either. – Christopher Kavazos May 31 '18 at 03:25
1

I think , if you have not turned off stringsAsFactors = F you may not get desired results, You can do this while creating the dataframe:

x <- data.frame(c.a, c.b, c.d, stringsAsFactors=F)
x$c.d <- as.numeric(x$c.d)
x[complete.cases(x),]

You may also do options(stringsAsFactors=F) at the top of your code, This would help you in many situations (You may choose to use if it suits you).

Running the above should give you the desired output.

You may use this solution( Thanks to @Onyambu ):

na.omit(transform(x,c.d=as.numeric(c.d))) 
PKumar
  • 10,971
  • 6
  • 37
  • 52
0

Convert the factor to a numeric (as.numeric(levels(x[, 'c.d']))[x[, 'c.d']]) and then index the NA's out of the data frame:

x <- x[!is.na(as.numeric(levels(x[, 'c.d']))[x[, 'c.d']]), ]

This produces a warning message (warnings ≠ errors) which you can ignore (it's giving that because converting non-numeric characters produces NAs, but that's exactly what we want it to do here).

Warning message:
In `[.data.frame`(x, !is.na(as.numeric(levels(x[, "c.d"]))[x[, "c.d"]]),  :
  NAs introduced by coercion

And this is the result, just as you ask for:

  c.a c.b c.d
1   1   a   1
2   5   c   4
3   3   f   6
6   9   q  65
7   2   w  45
8   3   e   2
9   7   t   6
rg255
  • 4,119
  • 3
  • 22
  • 40