How to download data from website by selecting right inputs using R (the query does not change the URL)

Question

I am new to web scraping. I would like to pull out data from this website: https://bpstat.bportugal.pt/dados/explorer

I have managed to get a response using the GET() function (even though not positive every time I run my code) using httr package.

library(httr)
URL <- "https://bpstat.bportugal.pt/dados/explorer"
r <- GET(URL)
r

Response [https://bpstat.bportugal.pt/dados/explorer]
  Date: 2020-04-09 22:25
  Status: 200
  Content-Type: text/html; charset=utf-8
  Size: 3.36 kB

I would like to send a request with these info that I would provide manually:

Accept the cookies on the first page
In the top right corner, select EN for English
Filter by domains – External statistics – Balance of payments
External operations - Balance of payments – current and capital accounts – current account – Goods and services account (highlight the following selection) :
Goods account; Services account; Manufacturing services on physical inputs; Maintenance and repair services; Transport services; Travel; Construction services; Insurance and pension services; Financial services; Charges for the use of intellectual property; Telecommunication, computer & information services; Other services provided by companies; Personal, cultural and recreational services; Government goods and services
Counterparty territory: All countries
Data type: Credit; Debit
Periodicity: Monthly
Unit of Measure: Millions of Euros
Select all series (click on them so they are highlighted in dark blue. At the top of the page click on the "Selected members" and then "go to associated series")
Go to Associated Series (increase number to be viewed on page at bottom of the screen. Increase from 10 to 50)
Manually tick all boxes except for "seasonally adjusted"
Go to "Selection list" Select "See in Table"
Download Excel three vertical dots at top ("visible data only")

I have seen a couple of examples like: - Send a POST request using httr R package but I don't know what inputs I need to provide...

This looks like a job for [`rvest`](https://cran.r-project.org/web/packages/rvest/index.html). It is significantly better equipped for dealing with the tasks you've identified: select something look in a table, download something. — r2evans, Apr 09 '20 at 23:06
I actually use this package when it comes to read html pages and retrieve URLS of existing files. However, what I need is to query the page https://bpstat.bportugal.pt/dados/explorer. Then, I want to read the data or download the associated Excel file. I do not know how to do it with rvest, nor with httr. — The-Dancing-Machine-Learning, Apr 10 '20 at 10:52
I have tried to use rvest fonctionalities using this https://www.analyticsvidhya.com/blog/2017/03/beginners-guide-on-web-scraping-in-r-using-rvest-with-hands-on-knowledge/ as an example but it seems that it does not work in my case — The-Dancing-Machine-Learning, Apr 10 '20 at 11:42
I think you may need something like [`RSelenium`](https://cran.r-project.org/web/packages/RSelenium/index.html) if you want to automate selection of series, to be honest. There's enough javascript and other fancy things there that I don't think that `rvest` will suffice. Having said that, once you know all of the `series_id`s that you need, you can just form the URL like this (I downloaded a CSV with a half dozen selected): https://bpstat.bportugal.pt/api/observations/csv/?series_ids=12509268,12510231,12510153,12514786,12509543,12512606,12509469&language=EN, perhaps that can be repeated? — r2evans, Apr 10 '20 at 14:56
Thank you, I did not realized the url of the CSV contained the required information! — The-Dancing-Machine-Learning, Apr 10 '20 at 22:49
It's a common mistake (I've made it) to think that fancy interfaces like that always require the use of `RSelenium`. Sometimes that's correct, but if you know how/where to look, often you find these shortcuts. — r2evans, Apr 10 '20 at 23:30

score 1 · Answer 1 · answered Aug 15 '20 at 12:06

That website has a documented API which you can use to pull data instead of trying to scrape the pages at https://bpstat.bportugal.pt/data/docs/

The outputs are JSON-stat, and you can use https://github.com/ajschumacher/rjstat to make them easier to handle.

How to download data from website by selecting right inputs using R (the query does not change the URL)

1 Answers1