Merging listed data.frames from a data.frame

Question

I'm querying data from the IMF using the package IMFData (https://github.com/mingjerli/IMFData).

The problem is that I do not know how to improve my code. There are 25 countries that I would like to extract data. I am sure my code is not the best to quickly create the desired dataframe (XDF, in my code).

library(IMFData)

databaseID <- "DOT"
startdate = "2013-01-01"
enddate = "2019-12-31"
checkquery = FALSE

queryfilter <- list(CL_FREQ = "M", CL_AREA_DOT = c("CA", "JP"), CL_INDICATOR_DOT = "TXG_FOB_USD", CL_COUNTERPART_AREA_DOT = "W00")
EXPORTS <- CompactDataMethod(databaseID, queryfilter, startdate, enddate, checkquery)

# I would like to improve this part:

XDF.1 <- EXPORTS$Obs[[1]]
XDF.2 <- EXPORTS$Obs[[2]]

XDF <- dplyr::left_join(XDF.1, XDF.2, by=c("@TIME_PERIOD"))

colnames(XDF) <- c("Date", "Canada", "Japan")

score 2 · Answer 1 · answered Apr 30 '20 at 15:25

2

You might want to create a list and combine together thanks to the Reduce function (or purrr:::reduce). I often come back to this post:

Assuming you have a list of X.DFs dataframe named X_DF, you can do:

Reduce(function(dtf1, dtf2) merge(dtf1, dtf2, by = "i", all.x = TRUE),
        list_XDF)

or, if you prefer tidyverse syntax:

list_XDF %>% purrr::reduce(left_join, by=c("@TIME_PERIOD"))

answered Apr 30 '20 at 15:25

linog

5,786
3
14
28

Thank you for your answer. It solves the problem from this part: `XDF <- dplyr::left_join(XDF.1, XDF.2, by=c("@TIME_PERIOD"))`. Do you have any ideas on how to extract the 25 countries from `EXPORTS`, without doing this for each of them: `XDF.1 <- EXPORTS$Obs[[1]] XDF.2 <- EXPORTS$Obs[[2]]` – bttomio Apr 30 '20 at 15:46
1

Does `list_XDF <- lapply(1:25, function(i) EXPORTS$Obs[[i]])` work ? – linog Apr 30 '20 at 16:03

Ian Campbell · Accepted Answer · 2020-04-30T16:22:37.667

Here is a base R approach that will get you all 227 areas with data.

First, get all the areas available with DataStructureMethod. Then split the list into sets of 25 areas so that the API won't fail. Create a new empty list to hold the returned data. Next, use a for loop to iterate over all the area sets and store the results into a list element.

library(IMFData)

databaseID <- "DOT"
startdate = "2013-01-01"
enddate = "2019-12-31"

areas <- DataStructureMethod("DOT")$CL_COUNTERPART_AREA_DOT$CodeValue
areas.list <- split(areas, ceiling(seq_along(areas)/25))

result.list <- list()
for(i in seq_along(areas.list)) {
  filter <- list(CL_FREQ = "M", CL_AREA_DOT = areas.list[[i]], CL_INDICATOR_DOT = "TXG_FOB_USD", CL_COUNTERPART_AREA_DOT = "W00")
  result.list[[i]] <- CompactDataMethod(databaseID, filter, startdate, enddate)
}

Now that we have all the data, we can extract the @OBS_VALUE from each area. So we can keep up with which is which, we will assign the column names to @REF_AREA. Then all we need to do is cbind all the areas together and add a time period column.

result <- sapply(result.list,function(x){y <- sapply(x$Obs,function(y){y[['@OBS_VALUE']]}); colnames(y) <- x[["@REF_AREA"]]; y})
result <- do.call(cbind,result)
result <- cbind(timeperiod = result.list[[1]]$Obs[[1]][['@TIME_PERIOD']],result)
result[1:10,1:10]
      timeperiod BB          BM         AF          BS           AL           AW          BD            BZ          AO           
 [1,] "2013-01"  "28.609779" "2.763473" "37.545734" "140.793072" "182.268383" "15.248257" "2135.314764" "26.993657" "5738.361548"
 [2,] "2013-02"  "31.408923" "2.588724" "23.319418" "51.207085"  "160.256056" "13.357883" "1883.921679" "31.959256" "5093.785673"
 [3,] "2013-03"  "26.490062" "2.161194" "34.313418" "116.533489" "187.347118" "11.807801" "2074.639533" "36.975964" "5836.777823"
 [4,] "2013-04"  "30.969022" "6.541486" "27.46926"  "79.9772"    "199.063249" "15.363928" "1996.029477" "39.84747"  "4953.276187"
 [5,] "2013-05"  "27.633188" "3.030127" "32.675746" "765.5369"   "221.793898" "13.232063" "2247.850876" "73.201747" "5425.804703"
 [6,] "2013-06"  "24.064953" "2.816781" "29.454347" "60.756462"  "201.765833" "13.698186" "2291.680871" "32.821853" "5271.431577"
 [7,] "2013-07"  "26.25563"  "2.657042" "15.540238" "95.12846"   "233.746903" "14.499091" "2359.924118" "33.763333" "5666.628083"
 [8,] "2013-08"  "26.85187"  "2.883294" "21.369248" "74.317362"  "180.045606" "15.545374" "1985.100494" "31.342921" "5557.632778"
 [9,] "2013-09"  "25.025515" "3.368449" "26.061924" "89.380055"  "211.352443" "12.323627" "2441.630301" "25.107398" "5558.266666"
[10,] "2013-10"  "34.040048" "3.249082" "49.352241" "128.44329"  "227.724296" "17.172523" "2131.788729" "28.489788" "5411.943251"

As you probably know, the names of those areas are available in DataStructureMethod("DOT")$CL_COUNTERPART_AREA_DOT.

This seems to be the right direction. Thank you. Nevertheless, when I try to extract specific countries, I got this error (slightly changing the last line of your code): `result <- as.data.frame(cbind(timeperiod = result.list[[1]]$Obs[[1]][['@TIME_PERIOD']],result)) result <- dplyr::select(result, "CA", "JP", "US", "PL", "TW", "MX", "BR", "HU", "SG", "ZA", "HK", "NZ", "RU", "CN", "CH", "KR", "ID", "NO", "U2", "AU", "DK", "GB", "SE", "TR")`. The error message is: `Error: Can't subset columns that don't exist. x The column `TW` doesn't exist.` — bttomio, Apr 30 '20 at 16:34
I've tried this approach, using `databaseID`, `startdate`, and `enddate`: `countries = c("CA", "JP", "US", "PL", "TW", "MX", "BR", "HU", "SG", "ZA", "HK", "NZ", "RU", "CN", "CH", "KR", "ID", "NO", "U2", "AU", "DK", "GB", "SE", "TR") queryfilter <- list(CL_FREQ = "M", CL_AREA_DOT = countries, CL_INDICATOR_DOT = "TXG_FOB_USD", CL_COUNTERPART_AREA_DOT = "W00") EXPORTS <- CompactDataMethod(databaseID, queryfilter, startdate, enddate, checkquery) list2env(EXPORTS, .GlobalEnv) XDF = do.call(what = cbind, args = Obs) XDF <- XDF[ , -which(names(XDF) %in% c("@TIME_PERIOD"))]` — bttomio, Apr 30 '20 at 16:40
You're right. "U2" seems to be missing too. It's weird for Taiwan because it's available (https://data.imf.org/regular.aspx?key=61013712). — bttomio, Apr 30 '20 at 16:51

Merging listed data.frames from a data.frame

2 Answers2