0

I have a dataset with four variables measuring respondents' view on different topics. I want to plot them into one stacked bar chart so you can compare the values between the different topics.

This are the first rows of the dataset:

lebanon <- structure(list(climate_change = c(
  "Not a very serious problem",
  "Not a very serious problem", NA, NA, "A very serious problem",
  "A somewhat serious problem"
), air_quality = c(
  "A somewhat serious problem",
  "Not a very serious problem", NA, NA, "A very serious problem",
  "A very serious problem"
), water_polution = c(
  "A somewhat serious problem",
  "Not a very serious problem", NA, NA, "A very serious problem",
  "Not at all a serious problem"
), trash = c(
  "A very serious problem",
  "Not a very serious problem", NA, NA, "A very serious problem",
  "A somewhat serious problem"
)), row.names = c(NA, -6L), class = "data.frame")

I did try with the following code based on this site:

lebanon %>%
  filter(!is.na(climate_change), !is.na(air_quality), !is.na(water_polution), !is.na(trash)) %>%
  gather(variable, value, climate_change:trash) %>%
  ggplot(aes(x = variable, y = value, fill = value)) +
  geom_bar(stat = "identity") +
  coord_flip()

Getting this graph:

enter image description here

There are three problems with this graph.

1.) The bar graphs are not the same length.

2.) I don't why there is something written at the location where x-axis hits the y-axis. How do I remove this?

3.) I want to order the values so they make sense, so I orderer them before with:

dataset$climate_change <- factor(dataset$climate_change, levels = c("Not at all a serious problem",
                                                                    "Not a very serious problem",
                                                                    "A somewhat serious problem",
                                                                    "A very serious problem"))

dataset$air_quality <- factor(dataset$air_quality, levels = c("Not at all a serious problem",
                                                                    "Not a very serious problem",
                                                                    "A somewhat serious problem",
                                                                    "A very serious problem"))

dataset$water_polution <- factor(dataset$water_polution, levels = c("Not at all a serious problem",
                                                                    "Not a very serious problem",
                                                                    "A somewhat serious problem",
                                                                    "A very serious problem"))

Yet the values are still unorderer. What am I doing wrong? Or is there a more effective way to make a multiple stacked bar chart?

stefan
  • 90,330
  • 6
  • 25
  • 51
Nicosc
  • 313
  • 1
  • 7

1 Answers1

0

The main issue with cour code is that you mapped value, i.e. a factor var, on y. Further you can simply use drop_na instead of filter and simply that the levels of value after the gather instead of repeating it for each var. (; Try this:

BTW: Please put your data into the post with dput(), e.g. dput(head(lebanon)). See my edit to your post. Took more time to clean and get the data right than answering the question. (;

** EDIT ** To get the bars ordered in the wanted order I make use of the forcats package. First I add_count the number of respondents thinking the issue is "A very serious problem". Then I fct_reorder variable accordingly, i.e. -n to get it descending. To reverse the order of value I make use of fct_rev.

lebanon <- structure(list(climate_change = c(
  "Not a very serious problem",
  "Not a very serious problem", NA, NA, "A very serious problem",
  "A somewhat serious problem"
), air_quality = c(
  "A somewhat serious problem",
  "Not a very serious problem", NA, NA, "A very serious problem",
  "A very serious problem"
), water_polution = c(
  "A somewhat serious problem",
  "Not a very serious problem", NA, NA, "A very serious problem",
  "Not at all a serious problem"
), trash = c(
  "A very serious problem",
  "Not a very serious problem", NA, NA, "A very serious problem",
  "A somewhat serious problem"
)), row.names = c(NA, -6L), class = "data.frame")

library(tidyverse)
lebanon %>%
  drop_na() %>% 
  gather(variable, value, climate_change:trash) %>%
  add_count(variable, value == "A very serious problem") %>% 
  mutate(value = factor(value, levels = c("Not at all a serious problem",
                                          "Not a very serious problem",
                                          "A somewhat serious problem",
                                          "A very serious problem"))) %>% 
  ggplot(aes(x = forcats::fct_reorder(variable, -n), fill = forcats::fct_rev(value))) +
  geom_bar() +
  coord_flip()

stefan
  • 90,330
  • 6
  • 25
  • 51
  • Awesome! Two questions: How do I change the order of the variables so the one with the most respondents saying that it is a very serious problem is at the top and then in the descending order? How do I put data with dput() from R to here? – Nicosc Jun 16 '20 at 16:26
  • Oh, and does drop_na() actually drop all respondents with at least one NA in the other variable? The dataset includes several columns that I didn't show. My intention to use filter(!is.na()) was to specifiy that I don't want to have NAs only in the specific variables. Respondents might have answered these variables, but they haven't given answers to each variable. – Nicosc Jun 16 '20 at 16:53
  • Hi @Nicosc. First. Simply copy and paste the output from `dput(...)` into your post. Second. `drop_na` will drop all rows with at least one NA. If you want only drop rows with NAs in specific columns of your data, then you have to stick with filter. Concerning your third question I will have another look at the data. – stefan Jun 16 '20 at 18:09
  • I just made an edit. Now the order of value is reversed and the variable with most repsondents thinking that this is a serious problem is on top. – stefan Jun 16 '20 at 18:18
  • Thanks :) If I write dput(head(lebanon)) in R and run it, it only prints a lot of things that doesn't seem to make sense. Maybe I'm not getting it. :/ – Nicosc Jun 16 '20 at 19:10
  • Yep. dput() simply converts your df in a format which don't have to make sense to humans but makes sense for R. (; But this way all the datatypes are preserved and as in your case quotes are put around all the strings which makes it quick and easy to simply paste your data and start answering. That's why all guys always demand a dput(). BTW. Had to learn that myself, too. – stefan Jun 16 '20 at 19:17
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/216086/discussion-between-nicosc-and-stefan). – Nicosc Jun 16 '20 at 19:35