-2

I wanted to use this time to improve my skills with R. I chose CoVID-19 as my topic and would like to visualize some data and maybe analyze it. I would be interested in how globalization is connected to the pandemic (maybe with a regression analysis). But first I would like to visualize some data. Do you have any tips which package is helpful for my purposes? I've already tried a little bit and I'm not really getting anywhere. My idea was a simple time series plot with the cumulated data of the Ecdc, which can be found in almost every newspaper nowdays. As Data I used:

data<- read.csv(file= "https://raw.githubusercontent.com/owid/covid-19-data/master/public/data/ecdc/total_cases_per_million.csv")

I have already looked at some tutorials and researched here in Stackoverflow. Until now I could not output a logical plot. My Goal is to recreate the two following pictures in R:

enter image description here enter image description here

Christopher Moore
  • 15,626
  • 10
  • 42
  • 52
  • 1
    Hello! Check out https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example. The question that you've posed here is a bit more general than is usually covered here. – Daniel V May 31 '20 at 21:49
  • 1
    Does this answer your question? [How to make a great R reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – HelpingHand Jun 01 '20 at 06:42
  • covid-19 data are not different than other data of that kind so title of your question is quite meaningless. – jangorecki Jun 01 '20 at 21:06

1 Answers1

2

Since the question seems to be mostly about how to get started with visualising this data, here is how one would plot a simple timeseries graph with some countries highlighted with the ggplot2 package.

Libraries and package import

library(tidyverse)

data<- read.csv(file= "https://raw.githubusercontent.com/owid/covid-19-data/master/public/data/ecdc/total_cases_per_million.csv")

The data is in what is called a 'wide' format, where columns are a date or region. ggplot2 works better with long data, where each observation is a row. You could convert these as follows:

long <- pivot_longer(data, World:Zimbabwe)

Next, we need to clean the data just a little bit by converting the date column to a Date class and throwing out NA observations (the latter is not needed but recommended, if you get NA warnings now it is probably a human error rather than data error).

long$date <- as.Date(long$date)
long <- long[!is.na(long$value),]

We could pick a few countries that we would like to highlight.

highlight_countries <- c("Russia", "San.Marino", "United.States")

Then we can make a lineplot out of this. There are a lot of tutorials about how to use ggplot, so you could search for those to customise the plot to your specific needs.

ggplot(long, aes(x = date, y = value)) +
  geom_line(aes(group = name,
                colour = ifelse(name %in% highlight_countries, name, NA))) +
  scale_colour_discrete(name = "Regions", labels = c(highlight_countries, "Other"))

Created on 2020-05-31 by the reprex package (v0.3.0)

teunbrand
  • 33,645
  • 4
  • 37
  • 63