Yet another data scientist not keeping a safe distance from COVID-19 data.
I am making a plot of the infection and deahts doubling times which I am calculating on a running basis, say previous 14 days. I use the glm
function in R to perform an log-link regression to get the doubling time and confidence interval (high/low) for that value. I'm putting these in the mapping as ymin
and ymax
. I get the confidence band, but it is understandably jagged as is the data. Is there a simple way to smooth the confidence bands?
covid_infection_folding %>%
ggplot() +
geom_point(aes(x=Date, y=US_Infections, color="US Infections")) +
geom_point(aes(x=Date, y=US_Deaths, color="US Deaths")) +
geom_smooth(
data=covid_infection_folding,
mapping=aes(x=Date, y=US_Infections, ymin=US_Infections_low, ymax=US_Infections_high, color="US Infections"),
stat="identity"
) +
geom_smooth(
data=covid_infection_folding,
mapping=aes(x=Date, y=US_Deaths, ymin=US_Deaths_low, ymax=US_Deaths_high, color="US Deaths"),
stat="identity"
) +
labs(
y="US Covid-19 Doubling Time (Days)",
title="Doubling Time (95% confidence intervals)",
subtitle="Based on purevious 14 days"
)
covid_infections_folding.csv
"Date","US_Infections","US_Infections_low","US_Infections_high","World_Infections","World_Infections_low","World_Infections_high","US_Deaths","US_Deaths_low","US_Deaths_high","World_Deaths","World_Deaths_low","World_Deaths_high"
2020-03-14,2.49983739883223,2.38168561730312,2.62426904052848,15.770263051682,13.9876095552037,18.0489074832855,3.99043275230409,3.69236832976168,4.32140858392337,12.8989698838882,11.4099582259867,14.8038432131489
2020-03-15,2.588057654306,2.47627312530032,2.70530124580819,14.1052241582214,12.5458193107193,16.0815164642284,4.18843478882557,3.99109724055167,4.39934762358077,11.4155530248942,10.1442558270398,13.0197551751998
2020-03-16,2.59916192635231,2.51169240835164,2.68989407531561,12.7183934752834,11.3821909647131,14.3853177366056,3.92980192885667,3.67451528683076,4.20867530268657,10.1892563438137,9.13874885076098,11.4843983540597
2020-03-17,2.50209502182501,2.41277807296645,2.59484269035652,11.5822559952822,10.4704174205308,12.9369688457438,3.65991741725404,3.38578333892515,3.96218180909321,9.22152165144878,8.36609944993533,10.2483894056885
2020-03-18,2.61215755183853,2.5110308935239,2.71767102078119,10.6653403951574,9.74603324802977,11.7586509149994,3.7539849058838,3.50894146518389,4.0213997894202,8.50182769556017,7.82837103723028,9.28511058399383
2020-03-19,2.12103811052124,1.86668177353069,2.40773667867803,9.68220899681998,8.85264088937122,10.664710714973,2.99387387850563,2.54689938108029,3.52708155887181,7.85987840683811,7.30624027338352,8.49070096408631
2020-03-20,1.99079441567267,1.81410759357454,2.18405773648449,8.80015137195741,8.06816675831183,9.65957369382066,2.80956713501713,2.47874770062383,3.18862071502173,7.25673080882034,6.76600476407113,7.81116618913957
2020-03-21,2.05278151635975,1.92120912686599,2.19340371313652,8.08237805494367,7.46494738129346,8.79499183281471,2.7798252180882,2.53323277524509,3.0530722207346,6.73383124186039,6.27979418641987,7.2451124137753
2020-03-22,2.17252528926468,2.04949059350162,2.30316218551139,7.58160689577727,7.09323516620424,8.13086338523261,2.65984002237084,2.47418817905869,2.86058591174651,6.35099806759453,5.97729955890051,6.76431384218689
2020-03-23,2.27085728917173,2.15744423749247,2.39046362864804,7.18006161242301,6.78634277949618,7.61413540189244,2.56825970738847,2.42667381891871,2.71864717722737,6.099512763381,5.8048741730648,6.41894911090602
2020-03-24,2.4531812109224,2.30077480144939,2.6164374199738,6.93254739983745,6.63082808556837,7.25805297994967,2.59389681397186,2.48718548076248,2.70554268775727,5.92711453226658,5.70556072017153,6.16268318967908
Note: the raw data is from here: https://github.com/CSSEGISandData/COVID-19