Reading URL does not show its contents

Question

I'm trying to read a URL exactly as has been suggested here.

However, I don't get the same contents in the output as what I see on the site itself.

myfile = f.read()
link = "http://www.primatiming.com/#/participant/11/40/37380"
f = urllib.request.urlopen(link)
myfile = f.read()
print(myfile)


b'<!doctype html>\n<html lang="en">\n<head>\n  <meta charset="utf-8">\n
<title>primatiming</title>\n  <base href="/">\n\n  <meta name="viewport"
content="width=device-width, initial-scale=1">\n  <link rel="icon" type="image/x-icon"
href="favicon.ico">\n<link rel="stylesheet" href="styles.1b97fe46abe0706759da.css">
</head>\n<body>\n  <app-root></app-root>\n<script type="text/javascript"
src="runtime.a66f828dca56eeb90e02.js"></script><script type="text/javascript"
src="polyfills.7b309130c7fc8668d4f8.js"></script><script type="text/javascript" 
src="scripts.8e2ccd20353c3cf5326a.js"></script><script type="text/javascript" 
src="main.a865153f87c564c09e4f.js"></script></body>\n</html>\n'

I was wondering if someone could suggest any ways of reading this page?

The reason for doing this is that I simply want to download the tables in the site from multiple pages within the primetiming url so that I could do some data analysis on them.

Thank you.

score 0 · Answer 1 · answered Sep 11 '18 at 07:43

0

It looks like you need some kind of an headless web browser (or Selenium) that handles and renders the javascript so that you can get the resulting html.

answered Sep 11 '18 at 07:43

Pius Raeder

1,423
1
14
20

score 0 · Answer 2 · answered Sep 11 '18 at 07:45

0

It's because the site uses xhr and you need to use headless browser for that or directly try their public api.

answered Sep 11 '18 at 07:45

nickanor

637
2
12
18

Chillie · Accepted Answer · 2018-09-11T13:34:57.540

As you can see you are getting some html that has script tags: the javascript within is supposed to run and download the content separately when you visit the page.

Your browser does that for you automatically when you visit the page normally. To verify this go to your browser's dev tools Network section, check "Preserve log" and try visiting the target page. If you copy the response you'll see the same result as using urllib gives you. So you need something that can run javascript for you to get to the data.

A popular approach is using Selenium, as suggested here.

score -1 · Answer 4 · answered Sep 11 '18 at 07:35

-1

I don't really understand your question, but I think you want to get informations from this page ?

So I can recommand you to use xpath with the libxml !

Make some research on the web for scraping

answered Sep 11 '18 at 07:35

asa

531
1
5
20

1

If you don't understand the question, do not answer - post a comment asking for clarifications instead. – bruno desthuilliers Sep 11 '18 at 07:50
@brunodesthuilliers Totally right, He has already > 50 rep – U13-Forward Sep 11 '18 at 07:53

Reading URL does not show its contents

4 Answers4