I am trying to iterate through global health epidemic data on a database which consists of daily cases, cumulative cases, daily deaths, and cumulative deaths (as well as some other covariables which aren't really relevant here). The table is structured as follows: For each country (with country name listed, region, ID) and each date (though not all dates are displayed for all countries*) the daily/cumulative cases/deaths/etc. are listed.
The data looks something like this:
# A tibble: 40 x 7
iso_code continent location date total_cases new_cases week
<chr> <chr> <chr> <date> <dbl> <dbl> <chr>
1 AFG Asia Afghanistan 2020-02-24 5 5 2020-08
2 AFG Asia Afghanistan 2020-02-25 5 0 2020-08
3 AFG Asia Afghanistan 2020-02-26 5 0 2020-08
4 AFG Asia Afghanistan 2020-02-27 5 0 2020-08
5 AFG Asia Afghanistan 2020-02-28 5 0 2020-08
6 AFG Asia Afghanistan 2020-02-29 5 0 2020-08
7 AFG Asia Afghanistan 2020-03-01 5 0 2020-09
8 AFG Asia Afghanistan 2020-03-02 5 0 2020-09
9 AFG Asia Afghanistan 2020-03-03 5 0 2020-09
10 AFG Asia Afghanistan 2020-03-04 5 0 2020-09
# ... with 30 more rows
I need to summarize the daily data into weekly data. Of course, this is no problem for one column: using methods described here I should be able to aggregate the data for each week, for each country as follows~
library(dplyr)
sumByColumn <- function(df, colName) {
# the method for daily (cases/deaths)/(cases/deaths) smoothed
df %>%
group_by(location, week) %>%
summarize(colName = sum(!! sym(colName)))
}
idByColumn <- function(df, colName) {
# the method for cumulative (cases/deaths)
df %>%
group_by(location, week) %>%
summarize(colName = identity(!! sym(colName)))
}
(It should be noted that, obviously, daily case/death data will be summarized, whereas cumulative case/death data will be simply the identity function as given. These columns, in the list of column names of df
, are denoted as id_cols
.)
However, when I try to run the sumByColumn()
/idByColumn()
loop along the entire dataframe df
, I run into this error:
for (col in 1:ncol(df)) {
colName = colnames(df)[col]
if (col%in%id_cols) {
df_weekly = idByColumn(df_weekly,colName)
} else {
df_weekly = sumByColumn(df_weekly,colName)
}
}
I get:
Error in !sym(colName) : invalid argument type
Note: I have computed the frequency by which the number of times each country appears in the dataframe, which corresponds to the number of days the disease was tracked. Is there a way to account for this, e.g. when I go through the weeks, if there is no data for that week, or an uneven number of countries per week give data, to ignore it and not return NA
?
916
916
910
892
884
899
971
938
899
946
Edit:
R Session Info is:
R version 4.1.2 (2021-11-01)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)
Matrix products: default
locale:
[1] LC_COLLATE=English_(Country).1252 LC_CTYPE=English_(Country).1252 LC_MONETARY=English_(Country).1252
[4] LC_NUMERIC=C LC_TIME=English_(Country).1252
system code page: 65001
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] nnet_7.3-17 plyr_1.8.6 car_3.0-12 carData_3.0-5
[5] nlme_3.1-153 lubridate_1.8.0 gridExtra_2.3 ExcelFunctionsR_0.1.4
[9] forcats_0.5.1 stringr_1.4.0 purrr_0.3.4 readr_2.1.2
[13] tidyr_1.2.0 tibble_3.1.6 ggplot2_3.3.5 tidyverse_1.3.1
[17] dplyr_1.0.7 readxl_1.3.1 poisson_1.0
loaded via a namespace (and not attached):
[1] tseries_0.10-49 httr_1.4.2 jsonlite_1.7.3 splines_4.1.2 modelr_0.1.8
[6] assertthat_0.2.1 TTR_0.24.3 sp_1.4-6 roperators_1.2.0 cellranger_1.1.0
[11] pillar_1.7.0 backports_1.4.1 lattice_0.20-45 glue_1.6.1 quadprog_1.5-8
[16] digest_0.6.29 rvest_1.0.2 colorspace_2.0-2 Matrix_1.3-4 timeDate_3043.102
[21] pkgconfig_2.0.3 broom_0.7.12 haven_2.4.3 scales_1.1.1 tzdb_0.2.0
[26] mgcv_1.8-38 generics_0.1.2 farver_2.1.0 ellipsis_0.3.2 withr_2.5.0
[31] urca_1.3-0 cli_3.1.1 quantmod_0.4.18 magrittr_2.0.2 crayon_1.5.0
[36] forecast_8.16 fs_1.5.2 fansi_1.0.2 xts_0.12.1 xml2_1.3.3
[41] tools_4.1.2 hms_1.1.1 lifecycle_1.0.1 munsell_0.5.0 reprex_2.0.1
[46] compiler_4.1.2 rlang_1.0.1 grid_4.1.2 rstudioapi_0.13 INLA_21.11.22
[51] labeling_0.4.2 gtable_0.3.0 fracdiff_1.5-1 abind_1.4-5 DBI_1.1.2
[56] curl_4.3.2 R6_2.5.1 zoo_1.8-9 utf8_1.2.2 stringi_1.7.6
[61] parallel_4.1.2 Rcpp_1.0.8 vctrs_0.3.8 dbplyr_2.1.1 tidyselect_1.1.2
[66] lmtest_0.9-39