0

I'm trying to read a URL exactly as has been suggested here.

However, I don't get the same contents in the output as what I see on the site itself.

myfile = f.read()
link = "http://www.primatiming.com/#/participant/11/40/37380"
f = urllib.request.urlopen(link)
myfile = f.read()
print(myfile)


b'<!doctype html>\n<html lang="en">\n<head>\n  <meta charset="utf-8">\n
<title>primatiming</title>\n  <base href="/">\n\n  <meta name="viewport"
content="width=device-width, initial-scale=1">\n  <link rel="icon" type="image/x-icon"
href="favicon.ico">\n<link rel="stylesheet" href="styles.1b97fe46abe0706759da.css">
</head>\n<body>\n  <app-root></app-root>\n<script type="text/javascript"
src="runtime.a66f828dca56eeb90e02.js"></script><script type="text/javascript"
src="polyfills.7b309130c7fc8668d4f8.js"></script><script type="text/javascript" 
src="scripts.8e2ccd20353c3cf5326a.js"></script><script type="text/javascript" 
src="main.a865153f87c564c09e4f.js"></script></body>\n</html>\n'

I was wondering if someone could suggest any ways of reading this page?

The reason for doing this is that I simply want to download the tables in the site from multiple pages within the primetiming url so that I could do some data analysis on them.

Thank you.

4 Answers4

0

It looks like you need some kind of an headless web browser (or Selenium) that handles and renders the javascript so that you can get the resulting html.

Pius Raeder
  • 1,423
  • 1
  • 14
  • 20
0

It's because the site uses xhr and you need to use headless browser for that or directly try their public api.

view it from dev console

nickanor
  • 637
  • 2
  • 12
  • 18
0

As you can see you are getting some html that has script tags: the javascript within is supposed to run and download the content separately when you visit the page.

Your browser does that for you automatically when you visit the page normally. To verify this go to your browser's dev tools Network section, check "Preserve log" and try visiting the target page. If you copy the response you'll see the same result as using urllib gives you. So you need something that can run javascript for you to get to the data.

A popular approach is using Selenium, as suggested here.

Chillie
  • 1,356
  • 13
  • 16
-1

I don't really understand your question, but I think you want to get informations from this page ?

So I can recommand you to use xpath with the libxml !

Make some research on the web for scraping

asa
  • 531
  • 1
  • 5
  • 20