-2

Is it possible to use summarise with hclust in R?

S %>% summarise(hc = hclust(dist()))

This gives me an error:

Error: Problem with `summarise()` column `hc`.
ℹ `hc = hclust(dist())`.
x argument "x" is missing, with no default
ℹ The error occurred in group 1: country = ch.
  • What's the exact error? What is `S`? What variable are you trying to calculate the distance for? It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input that can be used to test and verify possible solutions. – MrFlick May 25 '21 at 04:54
  • I'll include it later, thanks. Basically, S is a dataframe (all nummerical columns except the categorical one by which I want to group and compute the hclust for). The error I have updated in the description. – Emmanuel Goldstein May 25 '21 at 05:45

1 Answers1

0

What you need is a nest() to collapse your dataframe by groups and then map to iterate through your nested dataframes, using iris:

res = iris %>% nest(data = !Species) %>% 
mutate(hc=map(data,~hclust(dist(.x))))

# A tibble: 3 x 3
  Species    data              hc      
  <fct>      <list>            <list>  
1 setosa     <tibble [50 × 4]> <hclust>
2 versicolor <tibble [50 × 4]> <hclust>
3 virginica  <tibble [50 × 4]> <hclust>

plot(res$hc[[1]])

enter image description here

StupidWolf
  • 45,075
  • 17
  • 40
  • 72
  • Thanks. I can reproduce your code, however, mine gives me error: `Error: Problem with `mutate()` column `hc`. ℹ `hc = map(data, ~hclust(dist(t(.x))))`. x NA/NaN/Inf in foreign function call (arg 10) Run `rlang::last_error()` to see where the error occurred. In addition: Warning message: Problem with `mutate()` column `hc`. ℹ `hc = map(data, ~hclust(dist(t(.x))))`. ℹ NAs introduced by coercion ` I am using the transpose data (which works with the Iris dataset) because I want to use the columns as rows for computing the distance. – Emmanuel Goldstein May 25 '21 at 09:50
  • you have NAs in your dataset, hey without providing the dataset, this is a dead end – StupidWolf May 25 '21 at 10:50
  • I don't have any NA. It's all nummeric except for the category that I exclude. – Emmanuel Goldstein May 25 '21 at 10:57
  • 1
    If one of your variables has no variation, it will give you this error too – StupidWolf May 25 '21 at 13:10