How can I save source code from a website in a .txt file?

Question

url = "http://" + str(input)
t = urllib.request.urlopen(url)

how can I save the sourcecode of any Website in an .txt file? I use python version 3

Have a look here: http://stackoverflow.com/questions/9968091/save-html-source-code-to-file — Damián Montenegro, Aug 03 '15 at 20:28

score 3 · Answer 1 · edited Dec 26 '16 at 11:15

There are multiple ways you can get this done.

step 1: getting the data

This can be done using any library of your choice, My personal favorite is requests, the code goes as follows

import requests
headers = {'User-agents':'Mozilla/5.0'}
html_data = requests.get('Your url goes here',headers=headers)

This code will store the object at a location, to get the data in text format you can use

html_data = html_data.text

Step 2: Saving this data into a text file on the local machine

file = open('your file path goes here','ab') //this will open the file you have specified in the path
file.write(html.text.encode('UTF-8')) //Most of the HTML pages are encoded in ascii, you need to convert it into 'UTF-8' encoding to write it into a txt file.
file.close() //Close the file. all the mishaps in the world will happen if you don't close the file which is opened

This will save all the html code from a website to the text file which you have mentioned in the path.

If you were explicitly referring to saving the visible data in the website, try using some parser library, I Recommend using BeautifulSoup.

Here are the links to the actual python documentations for the libraries used and recommended.

Lib - Requests - link to the documentation
Lib - BeautifulSoup - Link to the Documentation

score 0 · Answer 2 · answered Aug 03 '15 at 20:33

0

There are tons of videos and tutorials about this, but still:

import urllib

t = urllib.urlopen(url).read()

with open("c:\\source_code.txt",'w') as source_code:
    source_code.write(t)

answered Aug 03 '15 at 20:33

Belial

821
1
9
12

It shows error: must be str, not bytes. Dont know what it means. – Algoritm Aug 03 '15 at 20:49

score 0 · Answer 3 · answered Oct 13 '17 at 01:02

0

This is the quickest way:

import urllib.request
a = str(input())
url = "http://" + a
urllib.request.urlretrieve(url, 'page.txt')

Bear in mind the site may not always be http:// and input() always takes ()

answered Oct 13 '17 at 01:02

Xantium

11,201
10
62
89

How can I save source code from a website in a .txt file?

3 Answers3

step 1: getting the data

Step 2: Saving this data into a text file on the local machine