2

I have the following dataframe (df) - there are more columns, but these are the relevant columns:

ID  Cost 
1    $100
1    $200
2    $50
2    $0
2    $40
3    $10
4    $100
5    $0
5    $50

I would like to subset this dataframe such that if any of the costs for a particular ID = $0, then it should remove all those rows (i.e. all the rows for that particular ID.)

Therefore, in this example, ID 2 and 5 contain a $0, so all of ID 2 and ID 5 rows should be deleted.

Here is the resulting df I would like:

    ID  Cost 
    1    $100
    1    $200
    3    $10
    4    $100

Could someone help with this? I tried some combinations of the subset function, but it didn't work.

** On a similar note: I have another dataframe with "NA"s - could you help me figure out the same problem, in case it were NAs, instead of 0's.

Thanks in advance!!

user4918087
  • 421
  • 1
  • 6
  • 14
  • A data.table option is. `library(data.table); setDT(df)[, if(!any(Cost=='$0')) .SD, ID]` – akrun Jun 09 '15 at 17:09

3 Answers3

4

try this:

subset(df,!df$ID %in% df$ID[is.na(df$Cost) | df$Cost == "$0"])

this gives you:

  ID Cost
1  1 $100
2  1 $200
6  3  $10
7  4 $100
Aaron Katch
  • 451
  • 3
  • 13
  • 1
    +1 nice job using `subset`. you may save keystrokes with `with(df, subset(df,!ID %in% ID[is.na(Cost) | Cost == "$0"]))` – Pierre L Jun 09 '15 at 17:04
3

Try

df[!df$ID %in% df$ID[df$Cost=="$0"],]
C_Z_
  • 7,427
  • 5
  • 44
  • 81
1

You can compute the IDs that you want to remove with something like tapply:

(has.zero <- tapply(df$Cost, df$ID, function(x) sum(x == 0) > 0))
#     1     2     3     4     5 
# FALSE  TRUE FALSE FALSE  TRUE 

Then you can subset, limiting to IDs that you don't want to remove:

df[!df$ID %in% names(has.zero)[has.zero],]
#   ID Cost
# 1  1  100
# 2  1  200
# 6  3   10
# 7  4  100

This is pretty flexible, because it enables you to limit IDs based on more complicated criteria (e.g. "the average cost for the ID must be at least xyz").

josliber
  • 43,891
  • 12
  • 98
  • 133