1

I am trying to get a plot of the number of Cov-19 in Italy over time, and came across this repository in GitHub, and tried to subset the data for Italy as such:

require(RCurl)
require(foreign)
x = getURL("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Confirmed.csv")
corona = read.csv(text = x, sep =",",header = T)
str(corona)
Italy <- corona[,corona$Country.Region=='Italy']
Italy <- corona[corona$Country.Region=='Italy',][1,5:ncol(corona)]
head(Italy)[,45:52]

which outputs:

> head(Italy)[,45:52]
   X3.6.20 X3.7.20 X3.8.20 X3.9.20 X3.10.20 X3.11.20 X3.12.20
17    4636    5883    7375    9172    10149    12462    12462
   X3.13.20
17    17660

Converting this to a time series with xts led me to several posts asking how to convert a database to a time series, where every day is a row in the variable Date, but in this dataframe it seems as though the each date is a variable.

I don't necessarily need to get this formatted as a time series, but I would like to plot over time the number of cases.


Here is a way to bypass timeseries:

require(RCurl)
require(foreign)
x = getURL("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Confirmed.csv")
corona = read.csv(text = x, sep =",",header = T)
str(corona)
Italy <- corona[,corona$Country.Region=='Italy']
Italy <- corona[corona$Country.Region=='Italy',][1,5:ncol(corona)]
Italy <- as.matrix(sapply(Italy, as.numeric))
plot(Italy[,1],typ='l',xlab='', ylab='', col='red', lwd=3,
     main="Italy Cov-19 cum cases")
Cerbrus
  • 70,800
  • 18
  • 132
  • 147
Antoni Parellada
  • 4,253
  • 6
  • 49
  • 114

2 Answers2

1

Here is a solution with tidyverse.

First, I use read_csv to diretly read in the csv file (the warning tells you the classes of the columns, which you can copy to the command, as all data classes were guessed correctly):

library(tidyverse)

data <- read_csv("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Confirmed.csv")

The dates are stored as column names. I use pivot_longer to transform into a long format. Once the dates are in the new column dates, we can use lubridate::mdy (mdy = month/day/year) to transform into a proper date format:

data_long <- data %>% 
  pivot_longer(cols = -c(`Province/State`, `Country/Region`, Lat, Long),
               names_to = "date",
               values_to = "cases") %>% 
  mutate(date = lubridate::mdy(date))

Now we can subset the data for Italy and plot:

data_long_ital <- data_long %>% 
  filter(`Country/Region` == "Italy")

ggplot(data_long_ital, aes(x = date, y = cases, group = `Country/Region`))+
  geom_line() +
  scale_x_date(date_breaks = "1 weeks")

JBGruber
  • 11,727
  • 1
  • 23
  • 45
1

We can convert to xts and plot

library(xts)
plot(xts(unlist(Italy), order.by = as.Date(sub("X", "", names(Italy)),
        "%m.%d.%y")), , main = "xts plot")

enter image description here


Some values are 0, so converting those to NA as it can lead to Inf values when the log2 conversion is done

library(dplyr)
plot(xts(log(na_if(unlist(Italy), 0), 2), order.by = as.Date(sub("X", "", names(Italy)),
     "%m.%d.%y")), main = 'xts log2 plot')

enter image description here

akrun
  • 874,273
  • 37
  • 540
  • 662