0

This is where I get my dataset and c

board_game_original<- read.csv("https://raw.githubusercontent.com/bryandmartin/STAT302/master/docs/Projects/project1_bgdataviz/board_game_raw.csv")

#tidy up the column of mechanic and category with cSplit function
library(splitstackshape)
mechanic <- board_game$mechanic
board_game_tidy <- cSplit(board_game,splitCols=c("mechanic","category"), sep = ",", direction = "long")

I am trying to make the graph more organized by ordering the bar by the values of the bar on the y-axis. I tried using the reorder function but still does not work. Does anyone have any suggestions? I am quite new to R and hope to learn more!

library(ggplot2)
average_complexity <- board_game_tidy %>% 
            filter(yearpublished >= 1950, users_rated >= 25, average_complexity>0 ) %>%
            select(average_complexity)
category_complexity_graph <- ggplot(data=board_game_tidy, aes(x = reorder(category, -average_complexity), y = average_complexity, na.rm = TRUE)) + 
        geom_bar(stat = "identity", na.rm = TRUE, color="white",fill="sky blue") + 
        ylim(0,5) +
        theme_bw() +
        ggtitle("Which category of board games has the highest level of average complexity") +
        xlab("category of board games") +
        ylab("average complexity of the board game") +
        theme(axis.text.x = element_text(size=5, angle = 45)) +
        theme(plot.title = element_text(hjust = 0.5)) 
category_complexity_graph

Here's the graph I plot: enter image description here "Category" is a categorical variable and "average complexity" is a continuous variable.

I was trying to answer the question "which category has the highest average complexity?" but this graph looks messy and any suggestion of cleaning it up would be appreciated as well! Thank you all

harperzhu
  • 49
  • 7
  • 1
    Please add data using `dput` or something that we can copy and use. Read about [how to ask a good question](http://stackoverflow.com/help/how-to-ask) and [how to give a reproducible example](http://stackoverflow.com/questions/5963269). – Ronak Shah Oct 26 '20 at 03:54
  • 1
    It looks like you are using `reorder()` in the `ggplot()` correctly (see reference here: https://sebastiansauer.github.io/ordering-bars/). However, I agree with @RonakShah, please provide a reproducible example so the community here can better help you. – LC-datascientist Oct 26 '20 at 04:30
  • 1
    A couple suggestions: (1) Double check you are using the generic `stats::reorder()` function because you may have other packages enabled that over-written the function; (2) Do not give an object the same name as a vector/column name inside another object, i.e., you have `average_complexity` as a data.frame and as a column inside `board_game_tidy`--it may confuse the program (and/or yourself). In other words, it looks pointless that you created `average_complexity` as a data.frame but were not using it (or maybe you are?). – LC-datascientist Oct 26 '20 at 04:32
  • Does this answer your question? [Reorder bars in geom\_bar ggplot2 by value](https://stackoverflow.com/questions/25664007/reorder-bars-in-geom-bar-ggplot2-by-value). – stefan Oct 26 '20 at 06:54
  • Thank you all for responding! I have tried to use stats:: reorder() and it still doesn't work. I'll now update the source dataset for a reproductive example. – harperzhu Oct 26 '20 at 18:25
  • @stefan Thank you for providing the link! I have used a similar method as reorder but it does not seem to work. I wonder if there are any subtle differences that I did not notice? – harperzhu Oct 26 '20 at 18:28

1 Answers1

0

Maybe this is what you are looking for. The issue is not about reordering, the issue is about preparing your data. (; Put differently the reordering by the average does not give you a nice plot, because you have multiple obs. per category and more importantly a different number of obs. per category. When you do a barplot with this dataset all these obs. get stacked, i.e. your plot is show the sum of average complexities. Hence, to achieve your desired result your have to first summarise your dataset by category. After doing so, your reordering code works and gives you a nice plot.

However, I would suggest to flip the axes which makes the labels easier to read:

board_game_original<- read.csv("https://raw.githubusercontent.com/bryandmartin/STAT302/master/docs/Projects/project1_bgdataviz/board_game_raw.csv")

#tidy up the column of mechanic and category with cSplit function
library(splitstackshape)
board_game <- board_game_original
mechanic <- board_game$mechanic
board_game_tidy <- cSplit(board_game,splitCols=c("mechanic","category"), sep = ",", direction = "long")

library(ggplot2)
library(dplyr)
# Summarise your dataset
board_game_tidy1 <- board_game_tidy %>% 
  as_tibble() %>% 
  filter(yearpublished >= 1950, users_rated >= 25, average_complexity > 0, !is.na(category)) %>%
  group_by(category) %>% 
  summarise(n = n(), average_complexity = mean(average_complexity, na.rm = TRUE))

ggplot(data=board_game_tidy1, aes(x = reorder(category, average_complexity), y = average_complexity, na.rm = TRUE)) + 
  geom_bar(stat = "identity", na.rm = TRUE, color="white",fill="sky blue") + 
  ylim(0,5) +
  theme_bw() +
  ggtitle("Which category of board games has the highest level of average complexity") +
  xlab("category of board games") +
  ylab("average complexity of the board game") +
  #theme(axis.text.x = element_text(size=5, angle = 45)) +
  theme(plot.title = element_text(hjust = 0.5)) +
  coord_flip()
 

enter image description here

stefan
  • 90,330
  • 6
  • 25
  • 51