0

I am running a cluster analysis on a random dataset (the precise dataset doesn't matter). After performing the analysis and assigning the optimal cluster number to perform the cutree() function, I have a repetitive bit of code used to generate summaries that I am trying to replace via a loop.

The code used up to this point is...

    library(cluster)
    library(dplyr)
    library(psych)

    df_dist <- dist(df)                                 
    df_hclust <- hclust(df_dist, method = "ward.D2")    
    plot(df_hclust, hang = -1)                          
    df_hclust_cut <- cutree(df_hclust, k = 4)           
    df <- mutate(df, cluster = df_hclust_cut)   

The section of code I want to replace is...

    cluster1 <- filter(df, cluster == 1)                
    cluster2 <- filter(df, cluster == 2)
    cluster3 <- filter(df, cluster == 3)
    cluster4 <- filter(df, cluster == 4)

The code I was hoping to use was...

    for (C in 1:4) {
      paste0("cluster", C) <- filter(df, cluster == C)
    }

But then I get the following error message...

Error in paste0("cluster", C) <- filter(df, cluster == C) : target of assignment expands to non-language object

How do I create a loop to create the four filtered datasets, and change the value used in the filter function?

A. Suliman
  • 12,923
  • 5
  • 24
  • 37
ARH
  • 127
  • 6
  • 3
    `cluster <- lapply(1:4, function(x) filter(df, cluster == x))`, or since you're using tidyverse you may want to do `cluster <- map(1:4, ~ filter(df, cluster == x))`. It's better not to have them as separate objects. – IceCreamToucan Aug 12 '19 at 15:33
  • 1
    See `?assign` but before that read this [why-is-using-assign-bad](https://stackoverflow.com/questions/17559390/why-is-using-assign-bad) – A. Suliman Aug 12 '19 at 15:35
  • 1
    sorry, the tidyverse (purrr) should have been `cluster <- map(1:4, ~ filter(df, cluster == .x))` – IceCreamToucan Aug 12 '19 at 15:42

1 Answers1

0

This doesn't work if you construct string using a more complex expression like paste0

Solution is by wrapping up everything inside assign.

for (C in 1:4) {
      assign(paste0("cluster", C),filter(df, cluster == C)) 
    }
geekzeus
  • 785
  • 5
  • 14