What is the frequency of any cell of column to appear in any other column?

Question

I have a question regarding comparing columns in a data frame.... Say I have a few data that look like this:

Unique <- c("apple", "orange", "melon", "car", "mouse", "headphones", "light")
a1 <- c("apple", "tomato", "banana", "dog", "cat", "headphones", "future")
a2 <- c("apple", "orange", "pear", "monkey", "dog", "cat", "river")
a3 <- c("tomato", "pineapple", "cherry", "car", "space", "mars", "rocket")
df <- data.frame(Unique, a1, a2, a3)
df

> ## df
##        Unique         a1     a2        a3
## 1:      apple      apple  apple    tomato
## 2:     orange     tomato orange pineapple
## 3:      melon     banana   pear    cherry
## 4:        car        dog monkey       car
## 5:      mouse        cat    dog     space
## 6: headphones headphones    cat      mars
## 7:      light     future  river    rocket

The question I am trying to answer is: what is the frequency of each cell of column "Unique" to appear in the entire data frame except in Unique column?

I would like an output that looks something like this:

 apple     2 
 orange    1 
  melon    0 
    car    1  
  mouse    0
headphones 0
  light    0

because in the entire data frame except the "Unique" column, apple appears 2 times, orange appears 1 time, melon appears 0 time, so on and so forth...

How would you go about getting this?

Also, how would we sort them based on the number of frequency, say highest to lowest?

I have been trying to figure this out for a couple of days now, and I just can't crack it... any help would be extremely appreciated!

p.s. also, in R, it seems like each "cell" in a dataframe is not referred to a cell..? am I correct? What are they referred to, elements?

Welcome to stack overflow and congrats well formatted first post. In a R when you refer to values in a table you have rows and columns. I would still like to recommend these [guidelines](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) for your future posts. — DJJ, Apr 07 '20 at 22:32
Thank you so much DJJ! and Thank you for the guidelines; I will give it a thorough read... — Joon Ho Seo, Apr 08 '20 at 00:38

akrun · Answer 1 · 2020-04-07T22:05:24.620

We can unlist the columns other than the 'Unique', convert it to factor with levels specified as 'Unique' and get the table in base R

table(factor(unlist(df[-1]), levels = df$Unique))
#      apple     orange      melon        car      mouse headphones      light 
#         2          1          0          1          0          1          0

Or using tidyverse

library(dplyr)
library(tidyr)
df %>% 
   pivot_longer(cols = -Unique) %>%
   mutate(value = factor(value, levels = unique(Unique))) %>% 
   filter(!is.na(value)) %>%
   count(value, .drop = FALSE)
# A tibble: 7 x 2
#  value          n
#* <fct>      <int>
#1 apple          2
#2 orange         1
#3 melon          0
#4 car            1
#5 mouse          0
#6 headphones     1
#7 light          0

Amazing solution! thank you for the answer! I am super excited about this :D — Joon Ho Seo, Apr 08 '20 at 00:33

Len Greski · Answer 2 · 2020-04-07T22:26:22.127

Here is a solution based on the tidyverse.

 Unique <- c("apple", "orange", "melon", "car", "mouse", "headphones", "light")
a1 <- c("apple", "tomato", "banana", "dog", "cat", "headphones", "future")
a2 <- c("apple", "orange", "pear", "monkey", "dog", "cat", "river")
a3 <- c("tomato", "pineapple", "cherry", "car", "space", "mars", "rocket")
df <- data.frame(Unique, a1, a2, a3,stringsAsFactors = FALSE)
df

library(tidyr)
library(dplyr)
df[,2:4] %>% pivot_longer(.,cols=c("a1","a2","a3")) %>% 
     group_by(value) %>% summarise(.,count = n()) %>% 
     right_join(.,df[1],by = c('value' = 'Unique')) %>% 
     mutate(count = ifelse(is.na(count),0,count))

...and the output.

# A tibble: 7 x 2
  value      count
  <chr>      <dbl>
1 apple          2
2 orange         1
3 melon          0
4 car            1
5 mouse          0
6 headphones     1
7 light          0
>

Thank you for the solution. I still have to learn to use tidyr and dplyr properly...I will use your solution for studying. Thank you! — Joon Ho Seo, Apr 08 '20 at 00:34

score 0 · Answer 3 · answered Apr 07 '20 at 22:26

with library(data.table)

Transforme your data.frame into a data.table

setDT(df)

Then you can melt the data.table with id="Unique". It is very convenient as for each values of Unique you have a value of all the columns of df in one column

##  melt(df,id.vars = "Unique")
##         Unique variable      value
##  1:      apple       a1      apple
##  2:     orange       a1     tomato
##  3:      melon       a1     banana
##  4:        car       a1        dog
##  5:      mouse       a1        cat
##  6: headphones       a1 headphones
##  7:      light       a1     future
##  8:      apple       a2      apple
##  9:     orange       a2     orange
## 10:      melon       a2       pear
## 11:        car       a2     monkey
## 12:      mouse       a2        dog
## 13: headphones       a2        cat
## 14:      light       a2      river
## 15:      apple       a3     tomato
## 16:     orange       a3  pineapple
## 17:      melon       a3     cherry
## 18:        car       a3        car
## 19:      mouse       a3      space
## 20: headphones       a3       mars
## 21:      light       a3     rocket
##         Unique variable      value

Finally for each value of Unique we just have to count how many values in the Unique column are equal to value.

melt(df,id.vars = "Unique")[,sum(Unique==value),Unique]
##        Unique V1
## 1:      apple  2
## 2:     orange  1
## 3:      melon  0
## 4:        car  1
## 5:      mouse  0
## 6: headphones  1
## 7:      light  0

Thank you for the suggestion! I didn't think about melting the data.table. Thank you! — Joon Ho Seo, Apr 08 '20 at 00:35

What is the frequency of any cell of column to appear in any other column?

3 Answers3