I am extracting data for specific geographical areas and specific indicators from the public health agency in the UK using a package they developed for pulling data from their API called fingertipsR
, and then inserting them in to an empty list, where the list consists of lists (geographies) which contain lists representing each indicator.
geog <- c("E38000220", "E38000046", "E38000144", "E38000191", "E38000210",
"E38000038", "E38000164", "E38000195", "E38000078", "E38000139",
"E38000166", "E38000211", "E38000147", "E38000183", "E38000028",
"E38000053", "E38000126", "E38000153", "E38000173", "E38000175"
)
indicators <- c(241, 92588, 90672, 90692, 90697, 90698, 90701, 90702, 91238,
90690, 90694, 93245, 93246, 93244, 93247, 93248, 93049, 93047,
90700)
## install.packages("fingertipsR"); library(fingertipsR)
library(dplyr)
list <- list()
start <- Sys.time()
for (geog_group in geog) {
for (indicator_number in indicators) {
list[[geog_group]][[as.character(indicator_number)]] <- fingertips_data(IndicatorID = indicator_number, AreaTypeID = c(152, 153, 154)) %>%
filter(AreaCode == geog_group, TimeperiodSortable == max(TimeperiodSortable)) %>%
select(Timeperiod, Value) %>% distinct()
}
}
end <- Sys.time()
end-start
On my work laptop, this takes around 15 minutes to execute - I'm wondering if there are any easy ways to optimise this code - possibly with lapply
or purrr
?
Edit: Ideally I want the indicators for each geographical area to be in one data frame, as they all share the same columns Time period
and Value
- I was going to deal with that after unlist()
or something similar - but if anyone has ways to solve that inside the for loop I'm open to suggestions.