sorry an absolute beginner so have some very basic questions!
I have a very large data set that lists individual transactions by a household. Example is below.
# hh_id trans_type transaction_value
# 1 hh1 food 4
# 2 hh1 water 5
# 3 hh1 transport 4
# 4 hh2 water 3
# 5 hh3 transport 1
# 6 hh3 food 10
# 7 hh4 food 5
# 8 hh4 transport 15
# 9 hh4 water 10
I want to to create a new data frame that has all transactions listed for ONLY the households that have transactions in the "water" category. (Eg, I would want a df without hh3 above because they have not had any expenses in "water")
as a first step, I have a data frame with one column (hh_ids) that only has the household IDs of the ones that I want. How do I then subset my larger dataframe to remove all rows of transactions that are not from a household that have expenses in the "water" category?
Data
## data from @gung
d <- read.table(text="hh_id trans_type transaction_value
hh1 food 4
hh1 water 5
hh1 transport 4
hh2 water 3
hh3 transport 1
hh3 food 10
hh4 food 5
hh4 transport 15
hh4 water 10", header=T)