1

I am trying to scrape List of 100 university from this website (topuniversity).

using =IMPORTXML("https://www.topuniversities.com/university-rankings/usa-rankings/2021","//*[@id='ranking-data-load']/div[1]/div/div/div/div[2]")

shows an error : Imported content is empty.

how to use xpath in order to fetch required data?

vish
  • 235
  • 1
  • 3
  • 16

1 Answers1

2

I found this xhr requests in developer tools

https://www.topuniversities.com/sites/default/files/qs-rankings-data/en/3738856.txt?1622189434?v=1622361479157

and your xpath won't work unless rendering JavaScript

In order to do that you got 2 choices

  • selenium / webbrowser (needs webdriver) either chrome or Firefox etc

  • gather appropriate headers & data for sending request via Requests module

And the code

import requests

URL = 'https://www.topuniversities.com/sites/default/files/qs-rankings-data/en/3738856.txt?1622189434?v=1622361479157'


headers = {
   "Host": "www.topuniversities.com",
   "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux armv8l; rv:88.0) Gecko/20100101 Firefox/88.0",
   "Accept": "application/json, text/javascript, */*; q=0.01",
   "Accept-Language": "en-US,en;q=0.5",
   "Accept-Encoding": "gzip, deflate",
   "Referer": "https://www.topuniversities.com/university-rankings/usa-rankings/2021",
   "X-Requested-With": "XMLHttpRequest",
   "via": "1.1 google"
}

datas = requests.get(URL, headers=headers).json()
import re

for i in datas['data']:
    for j in re.findall('class="uni-link">(.*)</a>',i['title']):
        print(j)

Results in

Harvard University
Stanford University
Massachusetts Institute of Technology (MIT)
University of California, Berkeley (UCB)
University of California, Los Angeles (UCLA)
Yale University
rene
  • 41,474
  • 78
  • 114
  • 152
  • @rene can you please tell how did you find developer tool and this url : https://www.topuniversities.com/sites/default/files/qs-rankings-data/en/3738856.txt?1622189434?v=1622361479157 – vish May 30 '21 at 10:20
  • @vish I didn't write the answer, I only edited it to make it a bit cleared. I have no idea what this user did to reach their answer. – rene May 30 '21 at 11:19
  • 1
    @Sheshanandh can you please tell how did you find XHR request developer tool and this url – vish May 30 '21 at 12:39