Omitting rows from a dataframe by specifying certain criteria

Question

I have tried to find an answer for this from previous questions, and the closest I found was for removing rows that contain a blank element, or an NA. This should have worked for mine but I cannot get it to do what I would like.

I have a dataframe imported from excel that I am manipulating to fix the layout for efficient statistical analysis. The equipment I am using returns data in a strange format.

Every second row in the dataframe contains the RSD values, which are the calculated errors from the software. In the column "Sample_Id", every second row has the Id "RSD".

I would like to modify the dataframe by removing the rows containing "RSD". To do this I have tried:

DTA <- DTA[!(DTA$Sample_Id == "RSD"),]

With "DTA" being my dataframe. This is similar to what was posted as the solution for the question regarding blanks and "NA", but for some reason does not work for my purpose.

My is made up of four columns containing 176 entries, but this varies for each experiment.

I would rather not omit the rows by row number.

I just asked a similar question https://stackoverflow.com/questions/48403916/how-to-subset-a-data-frame-if-the-column-contains-nas. You should replace the `==` in your code with `%in%` and it will likely work. — qdread, Jan 23 '18 at 15:04
What exactly do you mean "for some reason does not work for my purpose?" `install.packages("fortunes");require(fortunes);fortune(324)` Does `which(DTA$Sample_ID == "RSD")` print out the row numbers with "RSD"? — Bernhard, Jan 23 '18 at 15:07
Hello everyone, I have just solved it, and for those who have similar problems, here's how: I checked to see what type of data was stored in that column, and converted it to a string by using as.character(). Then I checked to see what was stored in the even elements of the column and it returned " RSD" instead of "RSD". This would be caused by the software for the equipment. I used the code I wrote above and put in " RSD" and it worked. Thank you to those who also provided solutions, I think qdread's solution should also work. — , Jan 23 '18 at 15:44

score 0 · Answer 1 · answered Jan 23 '18 at 15:11

0

you want to remove rows where variable Sample_Id is the string "RSD"? If so you can use dply package:

library(dplyr)
DTA <- filter(DTA,Sample_Id != "RSD")

answered Jan 23 '18 at 15:11

Antonios

1,919
1
11
18

score 0 · Answer 2 · answered Jan 23 '18 at 16:15

I have just solved it, and for those who have similar problems, here's how:

I checked to see what type of data was stored in that column, and converted it to a string by using as.character().

Then I checked to see what was stored in the even elements of the column and it returned " RSD" instead of "RSD".

This would be caused by the software for the equipment. I used the code I wrote above and put in " RSD" and it worked fine.

Thank you to those who also provided solutions, I think qdread's solution should also work.

Omitting rows from a dataframe by specifying certain criteria

2 Answers2