Extract data URL with javascript (table in php)

Question

I want to extract the data from this web page, http://old.emmsa.com.pe/emmsa_spv/rpEstadistica/rptVolPreciosDiarios.php, it uses java script at the moment I have not been able to find a way to extract the data of volume and prices of daily frequency.

I have tried many alternatives that are presented on this page but none have worked for me because it is a table that is obtained in two steps.

I have tried to adapt this code that appears here https://www.r-bloggers.com/2020/04/an-adventure-in-downloading-books/ But I couldn't download the data.

my version is :

library(Rcrawler)

install_browser() # One time only

br <- run_browser()

page<-LinkExtractor(url="http://old.emmsa.com.pe/emmsa_spv/rpEstadistica/rptVolPreciosDiarios.php",
                    Browser = br, ExternalLInks = TRUE)


el <- page$InternalLinks
sprlnks <- el[grep("emmsa", el, fixed = TRUE)]

for (sprlnk in sprlnks) {
  spr_page <- LinkExtractor(sprlnk)
  il <- spr_page$InternalLinks
  ttl <- spr_page$Info$Title
  ttl <- trimws(strsplit(ttl, "|", fixed = TRUE)[[1]][1])
  chapter_link <- il[grep("chapter", il, fixed = TRUE)][1]
  chp_splits <- strsplit(chapter_link, "/", fixed = TRUE)
  n <- length(chp_splits[[1]])
  suff <- chp_splits[[1]][n]
  suff <- gsub(".{2}$", "", suff)
  pref <- chp_splits[[1]][n-1]
  final_url <- paste0("http://old.emmsa.com.pe/emmsa_spv/rpEstadistica/rptVolPreciosDiarios.php", pref, "/",
                      suff, ".php")
  print(final_url)
  download.file(final_url, paste0(ttl, ".php"), mode = "wb")
  Sys.sleep(5)
}

stop_browser(br)

I get a file "Empresa Municipal de Mercados S.A.php" that is constantly repeated in which line 294 appears

Finally, what I want is that you can help me generate a script that allows me to download the daily price and volume data from the "emmsa" website.

QHarr · Answer 1 · 2022-06-18T04:53:57.860

1

You could do a POST request, as the page does and parse out the table from the response

library(httr)
library(rvest)
library(janitor)
library(dplyr)

headers <- c("Content-Type" = "application/x-www-form-urlencoded; charset=UTF-8")

data <- "vid_tipo=1&vprod=&vvari=&vfecha=15/06/2022"

r <- httr::POST(
  url = "http://old.emmsa.com.pe/emmsa_spv/app/reportes/ajax/rpt07_gettable.php",
  httr::add_headers(.headers = headers),
  body = data
)

t <- content(r) %>%
  html_element(".timecard") %>%
  html_table() %>%
  row_to_names(1) %>%
  clean_names() %>%
  dplyr::filter(producto != "") %>%
  mutate_at(vars(matches("precio")), as.numeric)

Volume option (different html)

library(httr)
library(rvest)
library(janitor)
library(dplyr)

headers <- c("Content-Type" = "application/x-www-form-urlencoded; charset=UTF-8")

data <- "vid_tipo=2&vprod=&vvari=&vfecha=17/06/2022"

r <- httr::POST(
  url = "http://old.emmsa.com.pe/emmsa_spv/app/reportes/ajax/rpt07_gettable.php",
  httr::add_headers(.headers = headers),
  body = data
)

t <- content(r) %>%
  html_element("#tbReport") %>%
  html_table()  %>%
  clean_names()

edited Jun 18 '22 at 04:53

answered Jun 16 '22 at 03:14

QHarr

83,427
12
54
101

1

Thank you, this is fantastic!!, with your help I was able to find the specific answer I needed – Carlos Garibotto Jun 16 '22 at 15:26
please see edit to answer – QHarr Jun 18 '22 at 04:54
I have tried to make an adaptation your to find var "volumen", `code`: – Carlos Garibotto Jun 21 '22 at 23:24
I have tried to make an adaptation your to find var "volumen", `code`: `data <- "vid_tipo=2&vprod=&vvari=&vfecha=15/06/2022" ` and replace variable price by volume in last row. `code` `mutate_at(vars(matches("volumen")), as.numeric)` in chunk t. ` mutate_at(vars(matches("volumen")), as.numeric)` Error in UseMethod("html_table") : no applicable method for 'html_table' applied to an object of class "xml_missing – Carlos Garibotto Jun 21 '22 at 23:38
For volume you just run the code as I wrote it. The html and the headers are different. – QHarr Jun 21 '22 at 23:45
1

Thank you @QHarr, i really needed to find the answer. I don't know anything about web scraping, "Master", you are on another level ! – Carlos Garibotto Jun 22 '22 at 17:08

Extract data URL with javascript (table in php)

1 Answers1

Linked