0

I am looking in to COVID-19 data and doing some analysis. My code below reads in an accessible csv file and filters the 15 worst hit countries for death rate (by selecting total_deaths as of July 3rd 2020).

library(dplyr)
library(ggplot2)

covid <- read.csv("https://github.com/owid/covid-19-data/raw/master/public/data/owid-covid-data.csv", header = TRUE, sep = ",", stringsAsFactors = FALSE)

total <- covid %>%
        select(total_deaths, location, date) %>%
        filter(date == '2020-07-03') 
total <- arrange(total, desc(total_deaths), location) [2:16, ]

The data is sorted in descending order however when I run ggpplot to plot the data, it sorts it by country in alphabetical order and not the country with the highest figure. How can I get ggplot to respect the sorting that was done in the previous step?

ggplot(total, aes(location, total_deaths, group=1)) +
        theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
        geom_bar(stat = "identity", fill = "green") +
        ggtitle("Top 15 Worst Countries") + xlab("Country") + ylab("Number of Deaths")

Another snag I hit is the number is now in scientific notation - how can I get that to display the actual numbers?

ka4c
  • 89
  • 1
  • 10

1 Answers1

2

One approach is to use reorder to change the levels of the factor that will ultimately determine the order of plotting.

You can use scales::comma to fix the scientific notation.

library(scales)
ggplot(total, aes(reorder(location, - total_deaths) , total_deaths, group=1)) +
        theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
        geom_bar(stat = "identity", fill = "green") +
        scale_y_continuous(labels = comma) +
        ggtitle("Top 15 Worst Countries") + xlab("Country") + ylab("Number of Deaths")

enter image description here

Ian Campbell
  • 23,484
  • 14
  • 36
  • 57