0

I have a dataframe looks like this, with NA values

id cat1 cat2 cat3 cat4
1 apple banana banana orange
2 orange banana apple orange
3 apple NA NA orange
4 orange banana apple NA

Each id is expected to have a common categories. so the table shall look like:

id cat
1 banana
2 orange
3 NA
4 NA

is there a simple way using base R? thank you

4 Answers4

2

We can use the Mode function from here

Mode <- function(x) {
  x <- na.omit(x)
  ux <- unique(x)
  ux[which.max(tabulate(match(x, ux)))]
}

and apply it to every row.

cbind(df[1], cat = apply(df[-1], 1, Mode))

#  id    cat
#1  1 banana
#2  2 orange

data

df <- structure(list(id = 1:2, cat1 = c("apple", "orange"), cat2 = c("banana", 
"banana"), cat3 = c("banana", "apple"), cat4 = c("orange", "orange"
)), class = "data.frame", row.names = c(NA, -2L))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • this method doesn't work though on rows with NA value – user9776841 Feb 19 '21 at 11:27
  • In the function you can write the first line as `x <- na.omit(x)`. – Ronak Shah Feb 19 '21 at 11:29
  • thanks. it takes the first value as the result if there is no common string. – user9776841 Feb 19 '21 at 11:37
  • @user9776841 Please update your post to include all the relevant edge cases that your data can take and show expected output for it. It is difficult to generalise an answer when you just show 2 most simple cases. Also it is usually better if you add data using `dput` (as I have at the bottom of my answer) which we can copy and also get unambiguous format of your data. – Ronak Shah Feb 19 '21 at 12:49
  • ach ok, got it thank you! – user9776841 Feb 20 '21 at 04:07
2

A data.table option

setDT(df)[, .(cat = names(tail(sort(table(na.omit(unlist(.SD)))), 1))), id]

gives

   id    cat
1:  1 banana
2:  2 orange

A base R option with apply

cbind(
  df[1],
  cat = apply(
    df[-1],
    1,
    function(x) names(tail(sort(table(na.omit(unlist(x)))), 1))
  )
)

gives

  id    cat
1  1 banana
2  2 orange

Data

> dput(df)
structure(list(id = 1:2, cat1 = c("apple", "orange"), cat2 = c("banana",
"banana"), cat3 = c("banana", "apple"), cat4 = c("orange", "orange"
)), class = "data.frame", row.names = c(NA, -2L))
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81
0

Does this work:

library(dplyr)
library(tidyr)
df %>% pivot_longer(cols = -id) %>% count(id, value) %>% 
     group_by(id) %>% 
            summarise(cat = value[which.max(n)])
`summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 2 x 2
     id cat   
  <int> <chr> 
1     1 banana
2     2 orange
Karthik S
  • 11,348
  • 2
  • 11
  • 25
0

Using tidyverse

library(dplyr)
library(purrr)
df %>%
  transmute(id, cat = pmap_chr(.[-1], ~ Mode(c(...))))
#  id    cat
#1  1 banana
#2  2 orange

where

Mode <- function(x) {
  x <- na.omit(x)
  ux <- unique(x)
  ux[which.max(tabulate(match(x, ux)))]
}

data

df <- structure(list(id = 1:2, cat1 = c("apple", "orange"), cat2 = c("banana",
"banana"), cat3 = c("banana", "apple"), cat4 = c("orange", "orange"
)), class = "data.frame", row.names = c(NA, -2L))
akrun
  • 874,273
  • 37
  • 540
  • 662