0

I'm currently trying to add a column to a data frame in R where, if certain conditions are met, a flag is made in a third column. Take a look at my example dataset below.

Name | Inventory | BLT_Flag
Amy    Bacon       1
Amy    lettuce     1
Amy    Tomato      1
John   Bacon       0
John   Tomato      0
Katie  Bacon       1
Katie  Lettuce     1
Katie  Tomato      1

Basically, I'm trying to code for the BLT_Flag. In this example, both Amy and Katie get the BLT Flags because their inventory includes all the ingredients for a BLT, while John is missing "Lettuce." I'm having a hard time creating a loop to create this flag. Any suggestions are greatly appreciated!

David Arenburg
  • 91,361
  • 17
  • 137
  • 196
MRP
  • 1
  • 4
  • reproducible example would be greatly appreciated. http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – s_baldur Feb 28 '17 at 16:13
  • 1
    I entered my previous example incorrectly. I hope the above table makes more sense. – MRP Feb 28 '17 at 16:19
  • Are duplicate lines possible or is it the case that if a certain name appears 3 times then `BLT = 1` ? – s_baldur Feb 28 '17 at 16:44
  • For the sake of this, duplicates aren't possible (there will be unique identifiers). If a certain name appears 3 times then BLT=1. – MRP Feb 28 '17 at 16:52

2 Answers2

1

Using the information in the comments that If a name appears three times, the BLT_Flag should be 1, we can just count the number of times each name appears and test if it is three.
Then build the BLT_Flag for each row based on the name. BTW, I stored your data in a data.frame named Supplies.

SupplyTable = table(Supplies$Name) == 3
SupplyTable 
  Amy  John Katie 
 TRUE FALSE  TRUE

BLT_Flag = as.numeric(SupplyTable[Supplies$Name])
BLT_Flag
[1] 1 1 1 0 0 1 1 1

However, as @Sotos pointed out, this solution is very specific to this problem. A more general solution would be to provide a list of ingredients and test whether are all ingredients available for each name. That can be accomplished with:

IngredientList = c("Bacon", "Tomato", "Lettuce")
SupplyTable = sapply(unique(Supplies$Name), 
    function(x) sum(!is.na(match(IngredientList, 
        Supplies$Inventory[Supplies$Name == x]))) == length(IngredientList ))
SupplyTable
  Amy  John Katie 
 TRUE FALSE  TRUE 

AllIngredientsFlag = as.numeric(SupplyTable[Supplies$Name])
 AllIngredientsFlag
[1] 1 1 1 0 0 1 1 1

As before, we generate a table that indicate for each name whether or not all ingredients are present, then use that to create the flag.

G5W
  • 36,531
  • 10
  • 47
  • 80
  • sorry...was thinking something completely different. I was thinking about bacon, tomato, cheese combo, but then again I am hungry :) – Sotos Feb 28 '17 at 17:31
  • OP's request was for BLT, not BTC. You should open a new question for that. – G5W Feb 28 '17 at 23:37
  • Nevertheless your solution is not general enough to account for that. – Sotos Mar 01 '17 at 07:29
  • Colon Dash RightParenthesis – G5W Mar 01 '17 at 12:48
  • make it a semicolon and you got me. On a serious note though, It is better not to overfit answers. The more generic they are, the more value they have. – Sotos Mar 01 '17 at 12:57
  • 1
    @Sotos You are right. I have generalized the solution. – G5W Mar 01 '17 at 14:15
0

Create data

library(dplyr)
dtf <- read.table(text = "Name  Inventory 
Amy    Bacon       
Amy    Lettuce     
Amy    Tomato      
John   Bacon       
John   Tomato      
Katie  Bacon       
Katie  Lettuce     
Katie  Tomato      ", header = TRUE, stringsAsFactors = FALSE)

Generate all combinations of name and ingredients for the desired recipee

desiredrecipe <- expand.grid(Inventory = c("Bacon", "Lettuce", "Tomato"),
                             Name = unique(dtf$Name),
                             stringsAsFactors = FALSE) 
numberofingredients <- length(unique(desiredrecipe$Inventory))

Check if all combinations of name and ingredients are present in the desired recipee

dtf2 <- dtf %>% 
    # say that it's present in the list
    mutate(present = 1) %>% 
    full_join(desiredrecipe, by = c("Name","Inventory")) %>% 
    group_by(Name) %>% 
    mutate(BLT_Flag = ifelse(sum(present)==numberofingredients,1,0)) 

# replace NA values by 0
dtf2$BLT_Flag[is.na(dtf2$BLT_Flag)] <- 0
dtf2



#   Name Inventory present BLT_Flag
#   <chr>     <chr>   <dbl>    <dbl>
# 1   Amy     Bacon       1        1
# 2   Amy   Lettuce       1        1
# 3   Amy    Tomato       1        1
# 4  John     Bacon       1        0
# 5  John    Tomato       1        0
# 6 Katie     Bacon       1        1
# 7 Katie   Lettuce       1        1
# 8 Katie    Tomato       1        1
# 9  John   Lettuce      NA        0
Paul Rougieux
  • 10,289
  • 4
  • 68
  • 110