Find column values that are shared among other column values R

Question

I have the following df that shows the food some people eat for a day.

df = data.frame("Name" = c("Brian", "Brian", "Brian",
                       "Alice", "Alice", "Alice",
                       "Paul", "Paul", "Paul",
                       "Clair", "Clair", "Clair"),
            "Meal" = c("Breakfast", "Lunch", "Dinner",
                       "Breakfast", "Lunch", "Dinner",
                       "Breakfast", "Lunch", "Dinner",
                       "Breakfast", "Lunch", "Dinner"),
            "Food" = c("Waffle", "Chicken", "Steak",
                       "Waffle", "Soup", "Steak",
                       "Waffle", "Chicken", "Chicken",
                       "Waffle", "Soup", "Chicken")

I want to find a food that was eaten by 100% of people, a food that was eaten by 75% of people, and a food that was eaten by 50% of people. In this case Waffle was eaten by everyone, chicken was eaten by 75% of people, and soup/steak was eaten by 50% of people.

EDIT:
Expected Output: The percentage of people who ate each food
Waffle - 100%
Chicken - 75%
Steak - 50%
Soup - 50% .

Chuck P · Answer 1 · 2020-06-26T20:13:32.083

library(dplyr)

df %>% 
  distinct(Name, Food) %>% 
  group_by(Food) %>% 
  summarise(WhatPercent = n() / nlevels(as.factor(.$Food))) %>%
  arrange(desc(WhatPercent)) %>%
  mutate(WhatPercent = paste0(WhatPercent * 100, "%"))


#> `summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 4 x 2
  Food    WhatPercent
  <chr>   <chr>      
1 Waffle  100%       
2 Chicken 75%        
3 Soup    50%        
4 Steak   50%

Your data

df <- data.frame("Name" = c("Brian", "Brian", "Brian",
                           "Alice", "Alice", "Alice",
                           "Paul", "Paul", "Paul",
                           "Clair", "Clair", "Clair"),
                "Meal" = c("Breakfast", "Lunch", "Dinner",
                           "Breakfast", "Lunch", "Dinner",
                           "Breakfast", "Lunch", "Dinner",
                           "Breakfast", "Lunch", "Dinner"),
                "Food" = c("Waffle", "Chicken", "Steak",
                           "Waffle", "Soup", "Steak",
                           "Waffle", "Chicken", "Chicken",
                           "Waffle", "Soup", "Chicken")
)

score 2 · Answer 2 · answered Jun 27 '20 at 00:18

Here's an approach that uses table:

x <- ((with(df, table(Food, Name)) >= 1) + 0)
## OR x <- table(unique(df[, c("Food", "Name")]))
x
#          Name
# Food      Alice Brian Clair Paul
#   Chicken     0     1     1    1
#   Soup        1     0     1    0
#   Steak       1     1     0    0
#   Waffle      1     1     1    1
rowSums(x)/ncol(x)
# Chicken    Soup   Steak  Waffle 
#    0.75    0.50    0.50    1.00

score 1 · Answer 3 · answered Jun 26 '20 at 19:04

You can use dplyr and janitor:

library(dplyr)
library(janitor)

df %>% 
  tabyl(Food, Name) %>% 
  mutate_if(is.numeric, ~ ifelse(. >= 1, 1, 0)) %>% 
  mutate(n = length(.) - 1) %>% 
  adorn_totals('col') %>% 
  mutate(Percent = paste0((Total - n)/n*100, "%")) %>% 
  select(Food, Percent)

This gives you:

    Food Percent
 Chicken     75%
    Soup     50%
   Steak     50%
  Waffle    100%

You can also change the last select argument to select(-c(n, Total)) if you want to keep counts for each person:

    Food Alice Brian Clair Paul Percent
 Chicken     0     1     1    1     75%
    Soup     1     0     1    0     50%
   Steak     1     1     0    0     50%
  Waffle     1     1     1    1    100%

Daniel O · Accepted Answer · 2020-06-26T19:04:08.307

0

Edit: With expected output explained

apply(aggregate(Food ~ Name, df, table)[-1],2, function(x) sum(x!=0)/length(x))*100

Food.Chicken    Food.Soup   Food.Steak  Food.Waffle 
          75           50           50          100

Old Answers

You should give an expected output as this question is unclear. Here is some code to rearrange your data into a form you may find more suitable for calculated statistics.

aggregate(Food ~ Meal, df, table)

      Meal Food.Chicken Food.Soup Food.Steak Food.Waffle
1 Breakfast            0         0          0           4
2    Dinner            2         0          2           0
3     Lunch            2         2          0           0

to find the most popular Food at each meal

Modes <- function(x) {
  ux <- unique(x)
  tab <- tabulate(match(x, ux))
  ux[tab == max(tab)]
}

aggregate(Food ~ Meal, df, function(x) levels(x)[Modes(x)] )

       Meal           Food
1 Breakfast         Waffle
2    Dinner Steak, Chicken
3     Lunch  Chicken, Soup

Credit for Modes Function

edited Jun 26 '20 at 19:04

answered Jun 26 '20 at 18:17

Daniel O

4,258
6
20

2

I _think_ OP looks for `colMeans(with(unique(df[, c("Name", "Food")]), table(Name, Food)))` – markus Jun 26 '20 at 18:19
The meal column is unimportant. The goal is to find similarities in the food people eat, and if there is a food eaten by everyone/ food eaten by most people. Regardless of the meal it was eaten at. Sorry for the confusion. – Eric Chagnon Jun 26 '20 at 18:35
You still need to put an expected output.. But I've added to my answer. Hope it helps – Daniel O Jun 26 '20 at 18:54
Sorry about that, this is my first time asking a question. I've added the expected output – Eric Chagnon Jun 26 '20 at 18:58
@markus 's answer is what I'm looking for! Marking this question as the answer even though the answer is in the comments. Thanks everyone! – Eric Chagnon Jun 26 '20 at 19:42
If stripped to its basic structure `apply(aggregate(Food ~ Name, df, table)[-1],2, function(x) sum(x!=0)/length(x))*100` looks suspiciously like `apply(aggregate(Food ~ Name, df, function(x) ifelse(table(x) == 0, 0, 1))[-1], 2, sum)`. You could/should at least credit my answer as being a starting point for your answer. – Chris Ruehlemann Jun 26 '20 at 20:10
@ChrisRuehlemann, looks pretty different to me. Ignoring the aggregate that has been in all my answers, apply is a common approach to col/row wise operations. Our functions(x) are totally different and have different outputs. – Daniel O Jun 27 '20 at 03:13
No it's not the use of `aggregate` or `apply`by themselves but the way they are integrated into one another that is highly similar – Chris Ruehlemann Jun 27 '20 at 06:47
@ChrisRuehlemann The custom function for one is in `apply` and for the other is in `aggregate` so the implementation is quite different. – Daniel O Jun 27 '20 at 22:15

Chris Ruehlemann · Answer 5 · 2020-06-26T19:00:13.310

0

Is this what you want?

apply(aggregate(Food ~ Name, df, function(x) ifelse(table(x) == 0, 0, 1))[-1], 2, sum)
Food.Chicken    Food.Soup   Food.Steak  Food.Waffle 
           3            2            2            4

Or would you prefer this?

apply(aggregate(Food ~ Name, df, function(x) ifelse(table(x) == 0, 0, 1))[-1], 2, 
      function(x)  ifelse(sum(x) == length(unique(df$Name)), "100%",  
                          ifelse(sum(x) == length(unique(df$Name)) - 1, "75%",
                                 ifelse(sum(x) == length(unique(df$Name)) - 2, "50%", 
                                        ifelse(sum(x) == length(unique(df$Name)) - 3, "25%", "0%")))))
Food.Chicken    Food.Soup   Food.Steak  Food.Waffle 
       "75%"        "50%"        "50%"       "100%"

edited Jun 26 '20 at 19:00

answered Jun 26 '20 at 18:27

Chris Ruehlemann

20,321
4
12
34

Just commented on the other answer, but the meal the food was eaten at is unimportant. I just want to see if there's a food that was eaten by everyone, or a food that was just eaten by most people. Regardless of the meal it was eaten at. Sorry for the confusion. – Eric Chagnon Jun 26 '20 at 18:39
That looks like a complicated version of `table(df$Food)` that gives an erroneous error for chicken. There are 4 instances of chicken. – Daniel O Jun 26 '20 at 18:57
Look at what OP says! – Chris Ruehlemann Jun 26 '20 at 18:59

Find column values that are shared among other column values R

5 Answers5