0

I'm still new to R and I could use some help. So I have a dataset that looks something like this

a <- c("a", "b", "c", "d", "a", "d") 
E <- c(NA, "E", NA, "E", NA, "E")
F <- c(NA, "F", "F", "F", NA, NA)
G <- c("G", NA, "G", "G", "G", NA)

df <- data.frame (a, E, F, G)

I'm trying to find out which one of E, F, or G, occurs most per group when I group by a. My biggest issue seems to be that they are characters in three separate columns. I tried combining them into one column but it didn't work. I'm struggling to find answers after searching for hours and am now just confused at what should be an easy question I would think. Any help would be amazing. Thanks!

Edit: Sorry I'm very new to the site and am still getting the formatting down. So the correct output would ideally be something like.

  a   Mostcommon
  -   ----------
1  a     "G"
2  b    "E""F"
3  c    "F""G"
4  d     "E"

Using the example I gave. With my actual data there should only be one most common value per group.

Clara W
  • 13
  • 2
  • 1
    Are these all in a data frame, something like `df <- data.frame(a, E, F, G)`? And are your `NA` values missing values (without quotes, `NA`) or strings with quotes `"NA"`? Could you show the expected output for this sample input? – Gregor Thomas May 03 '22 at 14:59
  • So what exactly is the correct output for this input. Are these supposed to be columns in a data.frame or are they truly separate vectors? – MrFlick May 03 '22 at 15:00

2 Answers2

0

Is this what you'd like to do?

library(tidyverse)

tibble(
  a = c("a", "b", "c", "d", "a", "d"),
  E = c("NA", "E", "NA", "E", "NA", "E"),
  F = c("NA", "F", "F", "F", "NA", "NA"),
  G = c("G", "NA", "G", "G", "G", "NA")
) |> 
  mutate(across(E:G, ~if_else(is.na(.), 0, 1))) |> 
  group_by(a) |> 
  summarise(across(E:G, sum))
#> # A tibble: 4 × 4
#>   a         E     F     G
#>   <chr> <dbl> <dbl> <dbl>
#> 1 a         0     0     2
#> 2 b         1     1     0
#> 3 c         0     1     1
#> 4 d         2     1     1

Created on 2022-05-03 by the reprex package (v2.0.1)

Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
Carl
  • 4,232
  • 2
  • 12
  • 24
  • I got this to work! I had to replace my NA values with 0 character values but then it worked great. I can total make this work for what I need. Thank you! – Clara W May 03 '22 at 15:38
  • Changed `== "NA"` to `is.na()` now that the question has been updated. – Gregor Thomas May 03 '22 at 15:51
0

You could use the Modes function defined here. ie I copy oasted it over here

Modes <- function(x) {
  ux <- unique(x)
  tab <- tabulate(match(x, ux))
  ux[tab == max(tab)]
}

Now with the modes function, do the following:

df %>%
  pivot_longer(-a, values_drop_na = TRUE)%>%
  group_by(a) %>%
  summarize(most_common = toString(Modes(value)))

# A tibble: 4 x 2
  a     most_common
  <chr> <chr>      
1 a     G          
2 b     E, F       
3 c     F, G       
4 d     E         
Onyambu
  • 67,392
  • 3
  • 24
  • 53