0

I need to extract all data from all tables from this website in http://ncpscxx.moa.gov.cn/#/sing?headingIndex=true&item=1 but I didn´t have success...

I tryed with rvest but...

library(tidyverse)
library(dplyr)
library(rvest)

url <- "http://ncpscxx.moa.gov.cn/#/sing?headingIndex=true&item=1"

page <- read_html(url)

tables<- page %>%
  html_table(fill=TRUE) 

View(tables) # There is a null list :( 

How can I solve this ?

MrFlick
  • 195,160
  • 17
  • 277
  • 295
  • Have you tried looking for posts about scraping tables in R? You will likely have to find the specific XPath if there are multiple. https://stackoverflow.com/questions/72944527/reading-in-a-table-using-rvest/72944693#72944693 https://stackoverflow.com/questions/31176709/load-a-table-from-wikipedia-into-r/31177077#31177077 https://stackoverflow.com/questions/50310595/data-scraping-in-r/50382537#50382537 – dcsuka Aug 26 '22 at 03:59

1 Answers1

0

The data you see on the screen is not HTML. You can use packages like "httr2" or "httr" to request the data from various links gathered from network section. Which can be found in the developer tools.

This is an example for the dataset in the bottom right.

"http://ncpscxx.moa.gov.cn/product/livestock-product-feed/trend/count" %>%  
  request() %>% 
  req_user_agent("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.102 Safari/537.36 Edg/104.0.1293.63") %>% 
  req_body_json(list(
    varietyCode = "AL01001", 
    queryEndTime = "2020", 
    queryStartTime = "2011"
  )) %>%  
  req_perform() %>%  
  resp_body_json(simplifyVector = TRUE) %>%  
  getElement(4) %>%  
  as_tibble

# A tibble: 7 x 3
  AMOUNT_FEED CHANGE_RATE REPORT_TIME
        <int> <chr>       <chr>      
1        8612 -           2014<U+5E74>     
2        7949 -7.7        2015<U+5E74>     
3        8298 4.39        2016<U+5E74>     
4        9078 9.4         2017<U+5E74>     
5        9584 5.57        2018<U+5E74>     
6        7651 -20.17      2019<U+5E74>     
7        8874 15.98       2020<U+5E74> 

enter image description here

Chamkrai
  • 5,912
  • 1
  • 4
  • 14
  • Thanks a lot @Tom Hoel ! But how can I change your code example here to get other tables ? – Rodrigo H. Ozon Aug 26 '22 at 09:53
  • @RodrigoH.Ozon You need to inspect the network tab in the developer tools to gather the links for each tables. Unfornunately, I wont be doing that as one example should be sufficient. Learn more about `httr2` here: https://httr2.r-lib.org/ – Chamkrai Aug 26 '22 at 10:08
  • I´m not the best html knower, but I tryed to find the same link that you have used in example and I didn´t have success to find....I´ll keep continue trying! Thks a lot @Tom Hoel! – Rodrigo H. Ozon Aug 26 '22 at 19:22