Remove rows from data.frame from continuous data containing symbols

Question

I have a data.frame with a column that should have continuous data. However, some of the rows have values with '~' and '<' symbols.

c.a <- c(1,5,3,7,4,9,2,3,7)
c.b <- c("a", "c", "f", "s", "r", "q", "w", "e", "t")
c.d <- c(1,4,6, '<5', '~34', 65, 45, 2, 6)
x <- data.frame(c.a, c.b, c.d)

The objective would be to remove rows 4 and 5 from the data.frame x

Hopefully this is not a repeated question, but I have done a quick search and cannot find a solution. Thanks in advance.

Thanks everyone. All comments provide a correct solution for those interested. — Christopher Kavazos, May 31 '18 at 03:34

chinsoon12 · Answer 1 · 2018-05-31T03:27:16.400

2

You can try converting to numeric and discard those that are non-numeric

x[!is.na(as.numeric(as.character(x$c.d))),]

output:

  c.a c.b c.d
1   1   a   1
2   5   c   4
3   3   f   6
6   9   q  65
7   2   w  45
8   3   e   2
9   7   t   6

edited May 31 '18 at 03:27

answered May 31 '18 at 02:57

chinsoon12

25,005
4
25
35

sorry the output is not what you wanted? "remove rows 4 and 5 from the data.frame x" – chinsoon12 May 31 '18 at 03:05
The output you have given is exactly what I want, but your code doesn't give me that result when I run it. Not sure why? – Christopher Kavazos May 31 '18 at 03:07
maybe restart your R. your x is prob not the same data.frame as what you have posted. using generic names for variables can cause problems sometimes – chinsoon12 May 31 '18 at 03:09
@chriskaye what *does* it give? Need to know if we want to diagnose the issue – rg255 May 31 '18 at 08:03
@griffinevo Yes the issue has been resolved. See the above solution. Thanks :-) – Christopher Kavazos Jun 19 '18 at 02:07

score 2 · Answer 2 · answered May 31 '18 at 03:02

2

You can use grepl() to filter:

x[grepl(x=as.numeric(x$c.d),"[^\\d]"), ]

Output:

  c.a c.b c.d
1   1   a   1
2   5   c   4
3   3   f   6
4   9   q  65
5   2   w  45
6   3   e   2
7   7   t   6

answered May 31 '18 at 03:02

andrew_reece

20,390
3
33
58

I am getting the same as the above comment. The code doesn't seem to be giving me the same output. – Christopher Kavazos May 31 '18 at 03:13
You're saying if you copy your example code in your post exactly, and then run the solutions that chinsoon and I gave, you are not getting the output we have posted? If so, I would agree with chinsoon's comment - trying a fresh start of your R environment. Both solutions work as demonstrated on my end, and it's just base R so there shouldn't be any package issues. (If it still doesn't work, can you describe what output you're seeing?) – andrew_reece May 31 '18 at 03:15
andrew_reece and @chinsoon12 I appreciate your assistance. Yep. I have tried in R, restarted it, and also tried in RStudio. No packages loaded. My output is exactly the same as 'x' before the line is run. It is not working on my full dataset either. – Christopher Kavazos May 31 '18 at 03:25

PKumar · Accepted Answer · 2018-05-31T04:20:52.923

1

I think , if you have not turned off stringsAsFactors = F you may not get desired results, You can do this while creating the dataframe:

x <- data.frame(c.a, c.b, c.d, stringsAsFactors=F)
x$c.d <- as.numeric(x$c.d)
x[complete.cases(x),]

You may also do options(stringsAsFactors=F) at the top of your code, This would help you in many situations (You may choose to use if it suits you).

Running the above should give you the desired output.

You may use this solution( Thanks to @Onyambu ):

na.omit(transform(x,c.d=as.numeric(c.d)))

edited May 31 '18 at 04:20

answered May 31 '18 at 03:23

PKumar

10,971
6
37
52

na.omit(transform(x,c.d=as.numeric(c.d))) – Onyambu May 31 '18 at 04:00

rg255 · Answer 4 · 2018-05-31T10:37:04.123

Convert the factor to a numeric (as.numeric(levels(x[, 'c.d']))[x[, 'c.d']]) and then index the NA's out of the data frame:

x <- x[!is.na(as.numeric(levels(x[, 'c.d']))[x[, 'c.d']]), ]

This produces a warning message (warnings ≠ errors) which you can ignore (it's giving that because converting non-numeric characters produces NAs, but that's exactly what we want it to do here).

Warning message:
In `[.data.frame`(x, !is.na(as.numeric(levels(x[, "c.d"]))[x[, "c.d"]]),  :
  NAs introduced by coercion

And this is the result, just as you ask for:

  c.a c.b c.d
1   1   a   1
2   5   c   4
3   3   f   6
6   9   q  65
7   2   w  45
8   3   e   2
9   7   t   6

Remove rows from data.frame from continuous data containing symbols

4 Answers4