R most common string value

Question

I have a dataframe looks like this, with NA values

id	cat1	cat2	cat3	cat4
1	apple	banana	banana	orange
2	orange	banana	apple	orange
3	apple	NA	NA	orange
4	orange	banana	apple	NA

Each id is expected to have a common categories. so the table shall look like:

id	cat
1	banana
2	orange
3	NA
4	NA

is there a simple way using base R? thank you

Ronak Shah · Accepted Answer · 2021-02-19T11:29:23.030

2

We can use the Mode function from here

Mode <- function(x) {
  x <- na.omit(x)
  ux <- unique(x)
  ux[which.max(tabulate(match(x, ux)))]
}

and apply it to every row.

cbind(df[1], cat = apply(df[-1], 1, Mode))

#  id    cat
#1  1 banana
#2  2 orange

data

df <- structure(list(id = 1:2, cat1 = c("apple", "orange"), cat2 = c("banana", 
"banana"), cat3 = c("banana", "apple"), cat4 = c("orange", "orange"
)), class = "data.frame", row.names = c(NA, -2L))

edited Feb 19 '21 at 11:29

answered Feb 19 '21 at 10:29

Ronak Shah

377,200
20
156
213

this method doesn't work though on rows with NA value – user9776841 Feb 19 '21 at 11:27
In the function you can write the first line as `x <- na.omit(x)`. – Ronak Shah Feb 19 '21 at 11:29
thanks. it takes the first value as the result if there is no common string. – user9776841 Feb 19 '21 at 11:37
@user9776841 Please update your post to include all the relevant edge cases that your data can take and show expected output for it. It is difficult to generalise an answer when you just show 2 most simple cases. Also it is usually better if you add data using `dput` (as I have at the bottom of my answer) which we can copy and also get unambiguous format of your data. – Ronak Shah Feb 19 '21 at 12:49
ach ok, got it thank you! – user9776841 Feb 20 '21 at 04:07

ThomasIsCoding · Answer 2 · 2021-02-19T11:33:00.440

2

A data.table option

setDT(df)[, .(cat = names(tail(sort(table(na.omit(unlist(.SD)))), 1))), id]

gives

   id    cat
1:  1 banana
2:  2 orange

A base R option with apply

cbind(
  df[1],
  cat = apply(
    df[-1],
    1,
    function(x) names(tail(sort(table(na.omit(unlist(x)))), 1))
  )
)

gives

  id    cat
1  1 banana
2  2 orange

Data

> dput(df)
structure(list(id = 1:2, cat1 = c("apple", "orange"), cat2 = c("banana",
"banana"), cat3 = c("banana", "apple"), cat4 = c("orange", "orange"
)), class = "data.frame", row.names = c(NA, -2L))

edited Feb 19 '21 at 11:33

answered Feb 19 '21 at 11:09

ThomasIsCoding

96,636
9
24
81

this method doesn't work when rows have empty value (NA). the result filters them out. – user9776841 Feb 19 '21 at 11:32
@user9776841 See update with `na.omit` – ThomasIsCoding Feb 19 '21 at 11:33
it still filters na values in the result. – user9776841 Feb 19 '21 at 11:41
that apply can be cleaned up a bit `function(x) names(sort(-table(na.omit(x)))[1])` – rawr Feb 19 '21 at 11:43
@rawr yes, exactly! Thanks! – ThomasIsCoding Feb 19 '21 at 12:59
@user9776841 Could you provide dummy data with `NA` as you mentioned? – ThomasIsCoding Feb 19 '21 at 13:00

score 0 · Answer 3 · answered Feb 19 '21 at 10:31

Does this work:

library(dplyr)
library(tidyr)
df %>% pivot_longer(cols = -id) %>% count(id, value) %>% 
     group_by(id) %>% 
            summarise(cat = value[which.max(n)])
`summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 2 x 2
     id cat   
  <int> <chr> 
1     1 banana
2     2 orange

score 0 · Answer 4 · answered Feb 19 '21 at 18:39

Using tidyverse

library(dplyr)
library(purrr)
df %>%
  transmute(id, cat = pmap_chr(.[-1], ~ Mode(c(...))))
#  id    cat
#1  1 banana
#2  2 orange

where

Mode <- function(x) {
  x <- na.omit(x)
  ux <- unique(x)
  ux[which.max(tabulate(match(x, ux)))]
}

data

df <- structure(list(id = 1:2, cat1 = c("apple", "orange"), cat2 = c("banana",
"banana"), cat3 = c("banana", "apple"), cat4 = c("orange", "orange"
)), class = "data.frame", row.names = c(NA, -2L))

R most common string value

4 Answers4

data