0

I have about 1K observations for each country and I have used facet_wrap to display each country's geom_bar but the output is by alphabetical order. I would want to cluster or order them by skew (so the most positive-skew are together and moving towards the normal-distribution countries, then the negative-skew countries ending with the most negative-skewed) without eyeballing what countries are more similar to each other. I was thinking maybe psych::describe() might be useful since it calculates skew, but I am having a hard time figuring out how I would implement adding that information to a similar question.

Any suggestions would be helpful

ibm
  • 744
  • 7
  • 14

2 Answers2

1

I can't go into too much detail without a reproducible example but this would be my general approach. Use psych::describe() to create a vector of countries that are sorted from most positive skew to least positive skew: country_order . Next, factor the country column in your dataset with country = factor(country, levels = country_order). When you use facet_wrap the plots will be displayed in the same order as country_order.

Jeff Bezos
  • 1,929
  • 13
  • 23
  • Thanks! I decided to post the full solution after some troubleshooting so others can use it if they have the same idea. – ibm May 27 '20 at 15:16
0

After some troubleshooting , I found (what I think is) an efficient way of doing it:

skews <- psych::describe.By(df$DV, df$Country, mat = TRUE) #.BY and mat will produce a matrix that you can use to merge into your df easily
skews %<>%select(group1, mean, skew) %>% sjlabelled::as_factor(., group1) #Turn it into a factor, I also kept country means 
combined <- sort(union(levels(df$Country), levels(skews$group1))) #I was getting an error that my levels were inconsistent even though they were the same (since group1 came from df$Country) which I think was due to having Country reference category Germany which through off the alphabetical sort of group1 so I used [dfrankow's answer][1]
df <- left_join(mutate(df, Country=factor(Country, levels=combined)),
                mutate(skews, Country=factor(group1, levels=combined))) %>% rename(`Country skew` = "skew", `Country mean` = "mean") %>% select(-group1) 

df$`Country skew` <- round(df$`Country skew`, 2)
ggplot(df) +
  geom_bar(aes(x = DV, y=(..prop..)))+ 
  xlab("Scale axis text") + ylab("Proportion") +
  scale_x_continuous()+
  scale_y_continuous(labels = scales::percent_format(accuracy = 1))+

  ggtitle("DV distribution by country mean")+ 
  facet_wrap(~ Country %>% fct_reorder(.,mean), nrow = 2) #this way the reorder that was important for my lm can remain intact
ibm
  • 744
  • 7
  • 14