R: Remove rows from data frame based on values in several columns

Question

I have the following dataframe (df) - there are more columns, but these are the relevant columns:

I would like to subset this dataframe such that if any of the costs for a particular ID = $0, then it should remove all those rows (i.e. all the rows for that particular ID.)

Therefore, in this example, ID 2 and 5 contain a $0, so all of ID 2 and ID 5 rows should be deleted.

Here is the resulting df I would like:

Could someone help with this? I tried some combinations of the subset function, but it didn't work.

** On a similar note: I have another dataframe with "NA"s - could you help me figure out the same problem, in case it were NAs, instead of 0's.

Thanks in advance!!

A data.table option is. `library(data.table); setDT(df)[, if(!any(Cost=='$0')) .SD, ID]` — akrun, Jun 09 '15 at 17:09

score 4 · Accepted Answer · answered Jun 09 '15 at 16:58

4

try this:

subset(df,!df$ID %in% df$ID[is.na(df$Cost) | df$Cost == "$0"])

this gives you:

  ID Cost
1  1 $100
2  1 $200
6  3  $10
7  4 $100

answered Jun 09 '15 at 16:58

Aaron Katch

451
3
13

1

+1 nice job using `subset`. you may save keystrokes with `with(df, subset(df,!ID %in% ID[is.na(Cost) | Cost == "$0"]))` – Pierre L Jun 09 '15 at 17:04

score 3 · Answer 2 · answered Jun 09 '15 at 16:50

3

Try

df[!df$ID %in% df$ID[df$Cost=="$0"],]

answered Jun 09 '15 at 16:50

C_Z_

7,427
5
44
81

score 1 · Answer 3 · answered Jun 09 '15 at 16:53

1

You can compute the IDs that you want to remove with something like tapply:

(has.zero <- tapply(df$Cost, df$ID, function(x) sum(x == 0) > 0))
#     1     2     3     4     5 
# FALSE  TRUE FALSE FALSE  TRUE

Then you can subset, limiting to IDs that you don't want to remove:

df[!df$ID %in% names(has.zero)[has.zero],]
#   ID Cost
# 1  1  100
# 2  1  200
# 6  3   10
# 7  4  100

This is pretty flexible, because it enables you to limit IDs based on more complicated criteria (e.g. "the average cost for the ID must be at least xyz").

answered Jun 09 '15 at 16:53

josliber

43,891
12
98
133

thanks @josilber! What if I want to remove the rows based on NAs? – user4918087 Jun 09 '15 at 16:56
Then you would change `sum(x == 0) > 0` to `sum(is.na(x)) > 0`. – josliber Jun 09 '15 at 16:56

R: Remove rows from data frame based on values in several columns

3 Answers3

Linked