The World Development Indicator looks as follows
library(data.table)
WDI <- fread("CountryName CountryCode IndicatorName IndicatorCode 1960 1961 2017
ArabWorld ARB A FX.OWN.TOTL.ZS 37.16521072 37.16521072 37.16521072
ArabWorld ARB B FX.OWN.TOTL.FE.ZS 25.63540268 25.63540268 25.63540268
ArabWorld ARB C FX.OWN.TOTL.MA.ZS 48.32851791 48.32851791 48.32851791
ArabWorld ARB D FX.OWN.TOTL.OL.ZS 42.54204559 42.54204559 42.54204559
ArabWorld ARB E FX.OWN.TOTL.40.ZS 27.72478104 27.72478104 27.72478104
ArabWorld ARB F FX.OWN.TOTL.PL.ZS 26.45811081 26.45811081 26.45811081
ArabWorld ARB G FX.OWN.TOTL.60.ZS 43.44695282 43.44695282 43.44695282
ArabWorld ARB H FX.OWN.TOTL.SO.ZS 48.66697693 48.66697693 48.66697693
ArabWorld ARB I FX.OWN.TOTL.YG.ZS 20.95479965 20.95479965 20.95479965
", header = TRUE)
I used the following code to reshape the World Development Indicator Database from the World Bank.
library(dplyr)
library(tidyr)
WDI <- WDI %>%
select(-`Indicator Name`) %>%
gather(Year, val,`1960`:`2017`) %>%
spread(`Indicator Code`, val)
It used to work without any issues, but for some reason it now requires too much memory for the operation to be completed.
I have tried to remove all other databases from the work space, gc()
, closing all other programs on the computer and to reduce the years used to reshape, but that did not fix the problem. As a result, I wonder whether there is a less memory intensive way to deal with the issue.
EDIT 1: According to this post dcast.data.table
or reshape
(because of not running out of memory) is the way to go. I am however having a lot of trouble rewriting the syntax (I got the dplyr answer my posting a question as well) How would I rewrite the dplyr
code for using dcast/reshape?
How do the terms select
, gather
, spread
, relates to dcast and reshape?
EDIT 2: I have first tried to melt the data.table as follows:
WDI = melt(WDI, id.vars = c("IndicatorCode", "CountryName", "CountryCode"),
# measure.vars = -c("IndicatorCode", "CountryName", "CountryCode", "IndicatorName"))
measure.vars = c("1960", "1961","2017"))
colnames(WDI)[4] <- "year"
WDI = dcast(WDI, CountryName + CountryCode + year ~ IndicatorCode, value.var="value")
But then I get the "warning" Aggregation function missing: defaulting to length
and all entries are just 1 instead of the values. This apparently happens when the combination of entries are not unique. I am however pretty sure that they are (the combination of the country and the Indicator, should make the entry unique).