3

sorry an absolute beginner so have some very basic questions!

I have a very large data set that lists individual transactions by a household. Example is below.

#   hh_id trans_type transaction_value
# 1   hh1       food                 4
# 2   hh1      water                 5
# 3   hh1  transport                 4
# 4   hh2      water                 3
# 5   hh3  transport                 1
# 6   hh3       food                10
# 7   hh4       food                 5
# 8   hh4  transport                15
# 9   hh4      water                10

I want to to create a new data frame that has all transactions listed for ONLY the households that have transactions in the "water" category. (Eg, I would want a df without hh3 above because they have not had any expenses in "water")

as a first step, I have a data frame with one column (hh_ids) that only has the household IDs of the ones that I want. How do I then subset my larger dataframe to remove all rows of transactions that are not from a household that have expenses in the "water" category?

Data

## data from @gung
d <- read.table(text="hh_id   trans_type  transaction_value
hh1 food    4
hh1 water   5
hh1 transport   4
hh2 water   3
hh3 transport   1
hh3 food    10
hh4 food    5
hh4 transport   15
hh4 water   10", header=T)
rawr
  • 20,481
  • 4
  • 44
  • 78
  • 1
    `df2 <- df1[df1$trans_type != "water",]` – alexwhitworth Jun 19 '16 at 16:19
  • 1
    @akrun dupe it up... this is certainly not an original question – alexwhitworth Jun 19 '16 at 16:26
  • You can use `library(data.table);setDT(df1)[,if(any(trans_type=="water")) .SD , by = hh_id]` or `library(dplyr); df1 %>% group_by(hh_id) %>% filter(any(trans_type=="water"))` I am sure there would be dupes, but unfortunately couldn't find it now. (It took almost 10 mins to search google and still couldn't find it. Isn't it more efficient to type an answer instead of searching the dupes?) – akrun Jun 19 '16 at 16:35
  • or `d[with(d, ave(trans_type %in% 'water', hh_id, FUN = sum)) > 0, ]` – rawr Jun 19 '16 at 18:11
  • 1
    Thanks for the help! I'm sure I'm not the first to ask this question but after about 20 min of googling I didn't feel like I even knew enough to phrase it so that I could ask for what I needed in a google search. Thanks again! – Ashley Thomas Jun 19 '16 at 22:26

1 Answers1

4
d <- read.table(text="hh_id   trans_type  transaction_value
hh1 food    4
hh1 water   5
hh1 transport   4
hh2 water   3
hh3 transport   1
hh3 food    10
hh4 food    5
hh4 transport   15
hh4 water   10", header=T)

dw <- as.character(with(d, hh_id[trans_type=="water"])) 
ds <- d[which(d$hh_id%in%dw),]
ds
#   hh_id trans_type transaction_value
# 1   hh1       food                 4
# 2   hh1      water                 5
# 3   hh1  transport                 4
# 4   hh2      water                 3
# 7   hh4       food                 5
# 8   hh4  transport                15
# 9   hh4      water                10
gung - Reinstate Monica
  • 11,583
  • 7
  • 60
  • 79