-1

When I want to find the unique values of a column of a data set I write:

unique(data$column_01)

Now, I want to find the unique values of several columns of that data frame. I write:

unique((select(data, starts_with("column")))) 

Can someone please clarify the mistake I am making here? What is the right formula to use and why this one is wrong?

danish
  • 25
  • 4

2 Answers2

0

You need to "apply" or "map" the unique function to each column,

lapply(select(data, starts_with("column")), unique)

You can use sapply or purrr::map instead of lapply for slight variations in behavior. This FAQ gives an excellent overview of the base options and when they are applicable. See the purrr package documentation for information about that.

As for "why this one is wrong" - unique() applied to a data frame will give you the unique rows of the data frame. This is different than the unique values of each column. Generally, functions may need to work differently on data frames than on vectors (columns), so foo(dataframe) cannot be assumed to be foo() applied to each column. So we use lapply or similar functions to specifically apply foo() to each column.

Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
0

Here is an option with summarise

library(dplyr)
data %>%
    summarise(across(starts_with('column'), ~ list(unique(.))))
akrun
  • 874,273
  • 37
  • 540
  • 662