-2
url = "http://" + str(input)
t = urllib.request.urlopen(url)

how can I save the sourcecode of any Website in an .txt file? I use python version 3

Xantium
  • 11,201
  • 10
  • 62
  • 89
Algoritm
  • 11
  • 1

3 Answers3

3

There are multiple ways you can get this done.

step 1: getting the data

This can be done using any library of your choice, My personal favorite is requests, the code goes as follows

import requests
headers = {'User-agents':'Mozilla/5.0'}
html_data = requests.get('Your url goes here',headers=headers)

This code will store the object at a location, to get the data in text format you can use

html_data = html_data.text

Step 2: Saving this data into a text file on the local machine

file = open('your file path goes here','ab') //this will open the file you have specified in the path
file.write(html.text.encode('UTF-8')) //Most of the HTML pages are encoded in ascii, you need to convert it into 'UTF-8' encoding to write it into a txt file.
file.close() //Close the file. all the mishaps in the world will happen if you don't close the file which is opened

This will save all the html code from a website to the text file which you have mentioned in the path.

If you were explicitly referring to saving the visible data in the website, try using some parser library, I Recommend using BeautifulSoup.

Here are the links to the actual python documentations for the libraries used and recommended.

Joshua Briefman
  • 3,783
  • 2
  • 22
  • 33
0

There are tons of videos and tutorials about this, but still:

import urllib

t = urllib.urlopen(url).read()

with open("c:\\source_code.txt",'w') as source_code:
    source_code.write(t)
Belial
  • 821
  • 1
  • 9
  • 12
0

This is the quickest way:

import urllib.request
a = str(input())
url = "http://" + a
urllib.request.urlretrieve(url, 'page.txt')

Bear in mind the site may not always be http:// and input() always takes ()

Xantium
  • 11,201
  • 10
  • 62
  • 89