I have a dataset with 1000 samples where I try to understand the diet pattern of students. I want to understand, how many have taken only: 1. only apple 2. only Banana 3. Only Orange 4. All three fruits 5. Apple + Banana 6. apple + orange 7. banana+ orange
Asked
Active
Viewed 142 times
0
-
It doesn’t seem clear what your question is. Are you using a specific language or database for this? Generically, you can produce all combinations of three parameters with three nested `for` loops. – crosen9999 Jul 04 '22 at 06:32
-
@crosen9999. Thank you for the suggestion. I am sorry for the ambiguity of my question. I am quite new to r language and have not been able to make a loop. – user142632 Jul 04 '22 at 09:28
-
Welcome to SO. Please do not post data as image. We need the data to show you how you can do it and you will hardly find anyone who's going to copy it. You can use `dput` and add the output to your question. Often it also helps when people see that you tried solving the issue yourself. Maybe this post helps: [How to make a great R reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – Jan Jul 04 '22 at 21:08
3 Answers
2
df %>%
pivot_longer(-student_id) %>%
group_by(student_id)%>%
summarise(name = toString(name[value>0]))%>%
count(name)
# A tibble: 5 x 2
name n
<chr> <int>
1 Apple 5
2 Apple, Banana, orange 1
3 Apple, orange 2
4 Banana 3
5 orange 2

Onyambu
- 67,392
- 3
- 24
- 53
0
You could do:
library(tidyverse)
data <-
tibble(student = c(1,2,3,4,5),
apple = c(1,0,0,1,1),
banana = c(0,0,1,0,1),
orange = c(0,1,0,1,1))
data |>
pivot_longer(-student, names_to = "fruit") |>
filter(value == 1) |>
group_by(student) |>
summarise(fruit = paste(fruit, collapse = "+")) |>
count(fruit)
Output:
# A tibble: 5 × 2
fruit n
<chr> <int>
1 apple 1
2 apple+banana+orange 1
3 apple+orange 1
4 banana 1
5 orange 1
All combinations will show up using the full data.

harre
- 7,081
- 2
- 16
- 28
0
Here is a base R option using table
+ paste
as.data.frame(
table(
trimws(
do.call(
paste,
as.data.frame(
ifelse(df[-1] > 0,
names(df[-1])[col(df[-1])],
""
)
)
)
)
)
)
which gives
Var1 Freq
1 apple 5
2 apple orange 2
3 apple banana orange 1
4 banana 3
5 orange 2
Or
as.data.frame(
table(
apply(
as.data.frame(ifelse(df[-1] > 0, names(df[-1])[col(df[-1])], NA)),
1,
function(x) toString(na.omit(x))
)
)
)
which gives
Var1 Freq
1 apple 5
2 apple, banana, orange 1
3 apple, orange 2
4 banana 3
5 orange 2
Data
df <- data.frame(
student = 1:13,
apple = c(0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0),
banana = c(1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1),
orange = c(0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0)
)

ThomasIsCoding
- 96,636
- 9
- 24
- 81