1

I'm querying data from the IMF using the package IMFData (https://github.com/mingjerli/IMFData).

The problem is that I do not know how to improve my code. There are 25 countries that I would like to extract data. I am sure my code is not the best to quickly create the desired dataframe (XDF, in my code).

library(IMFData)

databaseID <- "DOT"
startdate = "2013-01-01"
enddate = "2019-12-31"
checkquery = FALSE

queryfilter <- list(CL_FREQ = "M", CL_AREA_DOT = c("CA", "JP"), CL_INDICATOR_DOT = "TXG_FOB_USD", CL_COUNTERPART_AREA_DOT = "W00")
EXPORTS <- CompactDataMethod(databaseID, queryfilter, startdate, enddate, checkquery)

# I would like to improve this part:

XDF.1 <- EXPORTS$Obs[[1]]
XDF.2 <- EXPORTS$Obs[[2]]

XDF <- dplyr::left_join(XDF.1, XDF.2, by=c("@TIME_PERIOD"))

colnames(XDF) <- c("Date", "Canada", "Japan")
bttomio
  • 2,206
  • 1
  • 6
  • 17

2 Answers2

2

You might want to create a list and combine together thanks to the Reduce function (or purrr:::reduce). I often come back to this post:

Assuming you have a list of X.DFs dataframe named X_DF, you can do:

Reduce(function(dtf1, dtf2) merge(dtf1, dtf2, by = "i", all.x = TRUE),
        list_XDF)

or, if you prefer tidyverse syntax:

list_XDF %>% purrr::reduce(left_join, by=c("@TIME_PERIOD"))
linog
  • 5,786
  • 3
  • 14
  • 28
  • Thank you for your answer. It solves the problem from this part: `XDF <- dplyr::left_join(XDF.1, XDF.2, by=c("@TIME_PERIOD"))`. Do you have any ideas on how to extract the 25 countries from `EXPORTS`, without doing this for each of them: `XDF.1 <- EXPORTS$Obs[[1]] XDF.2 <- EXPORTS$Obs[[2]]` – bttomio Apr 30 '20 at 15:46
  • 1
    Does `list_XDF <- lapply(1:25, function(i) EXPORTS$Obs[[i]])` work ? – linog Apr 30 '20 at 16:03
2

Here is a base R approach that will get you all 227 areas with data.

First, get all the areas available with DataStructureMethod. Then split the list into sets of 25 areas so that the API won't fail. Create a new empty list to hold the returned data. Next, use a for loop to iterate over all the area sets and store the results into a list element.

library(IMFData)

databaseID <- "DOT"
startdate = "2013-01-01"
enddate = "2019-12-31"

areas <- DataStructureMethod("DOT")$CL_COUNTERPART_AREA_DOT$CodeValue
areas.list <- split(areas, ceiling(seq_along(areas)/25))

result.list <- list()
for(i in seq_along(areas.list)) {
  filter <- list(CL_FREQ = "M", CL_AREA_DOT = areas.list[[i]], CL_INDICATOR_DOT = "TXG_FOB_USD", CL_COUNTERPART_AREA_DOT = "W00")
  result.list[[i]] <- CompactDataMethod(databaseID, filter, startdate, enddate)
}

Now that we have all the data, we can extract the @OBS_VALUE from each area. So we can keep up with which is which, we will assign the column names to @REF_AREA. Then all we need to do is cbind all the areas together and add a time period column.

result <- sapply(result.list,function(x){y <- sapply(x$Obs,function(y){y[['@OBS_VALUE']]}); colnames(y) <- x[["@REF_AREA"]]; y})
result <- do.call(cbind,result)
result <- cbind(timeperiod = result.list[[1]]$Obs[[1]][['@TIME_PERIOD']],result)
result[1:10,1:10]
      timeperiod BB          BM         AF          BS           AL           AW          BD            BZ          AO           
 [1,] "2013-01"  "28.609779" "2.763473" "37.545734" "140.793072" "182.268383" "15.248257" "2135.314764" "26.993657" "5738.361548"
 [2,] "2013-02"  "31.408923" "2.588724" "23.319418" "51.207085"  "160.256056" "13.357883" "1883.921679" "31.959256" "5093.785673"
 [3,] "2013-03"  "26.490062" "2.161194" "34.313418" "116.533489" "187.347118" "11.807801" "2074.639533" "36.975964" "5836.777823"
 [4,] "2013-04"  "30.969022" "6.541486" "27.46926"  "79.9772"    "199.063249" "15.363928" "1996.029477" "39.84747"  "4953.276187"
 [5,] "2013-05"  "27.633188" "3.030127" "32.675746" "765.5369"   "221.793898" "13.232063" "2247.850876" "73.201747" "5425.804703"
 [6,] "2013-06"  "24.064953" "2.816781" "29.454347" "60.756462"  "201.765833" "13.698186" "2291.680871" "32.821853" "5271.431577"
 [7,] "2013-07"  "26.25563"  "2.657042" "15.540238" "95.12846"   "233.746903" "14.499091" "2359.924118" "33.763333" "5666.628083"
 [8,] "2013-08"  "26.85187"  "2.883294" "21.369248" "74.317362"  "180.045606" "15.545374" "1985.100494" "31.342921" "5557.632778"
 [9,] "2013-09"  "25.025515" "3.368449" "26.061924" "89.380055"  "211.352443" "12.323627" "2441.630301" "25.107398" "5558.266666"
[10,] "2013-10"  "34.040048" "3.249082" "49.352241" "128.44329"  "227.724296" "17.172523" "2131.788729" "28.489788" "5411.943251"

As you probably know, the names of those areas are available in DataStructureMethod("DOT")$CL_COUNTERPART_AREA_DOT.

Ian Campbell
  • 23,484
  • 14
  • 36
  • 57
  • This seems to be the right direction. Thank you. Nevertheless, when I try to extract specific countries, I got this error (slightly changing the last line of your code): `result <- as.data.frame(cbind(timeperiod = result.list[[1]]$Obs[[1]][['@TIME_PERIOD']],result)) result <- dplyr::select(result, "CA", "JP", "US", "PL", "TW", "MX", "BR", "HU", "SG", "ZA", "HK", "NZ", "RU", "CN", "CH", "KR", "ID", "NO", "U2", "AU", "DK", "GB", "SE", "TR")`. The error message is: `Error: Can't subset columns that don't exist. x The column `TW` doesn't exist.` – bttomio Apr 30 '20 at 16:34
  • I've tried this approach, using `databaseID`, `startdate`, and `enddate`: `countries = c("CA", "JP", "US", "PL", "TW", "MX", "BR", "HU", "SG", "ZA", "HK", "NZ", "RU", "CN", "CH", "KR", "ID", "NO", "U2", "AU", "DK", "GB", "SE", "TR") queryfilter <- list(CL_FREQ = "M", CL_AREA_DOT = countries, CL_INDICATOR_DOT = "TXG_FOB_USD", CL_COUNTERPART_AREA_DOT = "W00") EXPORTS <- CompactDataMethod(databaseID, queryfilter, startdate, enddate, checkquery) list2env(EXPORTS, .GlobalEnv) XDF = do.call(what = cbind, args = Obs) XDF <- XDF[ , -which(names(XDF) %in% c("@TIME_PERIOD"))]` – bttomio Apr 30 '20 at 16:40
  • It doesn't look like the API returns data for "TW". – Ian Campbell Apr 30 '20 at 16:43
  • You're right. "U2" seems to be missing too. It's weird for Taiwan because it's available (https://data.imf.org/regular.aspx?key=61013712). – bttomio Apr 30 '20 at 16:51