python read file from a web URL

Question

I am currently trying to read a txt file from a website.

My script so far is:

webFile = urllib.urlopen(currURL)

This way, I can work with the file. However, when I try to store the file (in webFile), I only get a link to the socket. Another solution I tried was to use read()

webFile = urllib.urlopen(currURL).read()

However this seems to remove the formating (\n, \t etc) are removed.

If I open the file like this:

 webFile = urllib.urlopen(currURL)

I can read it line by line:

for line in webFile:
    print line

This will should result in:

"this" 
"is" 
"a"
"textfile"

But I get:

't'
'h'
'i'
...

I wish to get the file on my computer, but maintain the format at the same time.

http://stackoverflow.com/questions/22676/how-do-i-download-a-file-over-http-using-python. Just take webFile and write it to a file. — postelrich, Oct 06 '15 at 13:56
is there no way of doing it, without hving to first write it to a local file? — mat, Oct 06 '15 at 13:59

score 8 · Accepted Answer · answered Oct 06 '15 at 14:02

8

You should use readlines() to read entire line:

response = urllib.urlopen(currURL)
lines = response.readlines()
for line in lines:
    .
    .

But, i strongly recommend you to use requests library. Link here http://docs.python-requests.org/en/latest/

answered Oct 06 '15 at 14:02

Pasqual Guerrero

406
2
8

score 1 · Answer 2 · answered Oct 06 '15 at 14:00

1

This is because you iterate over a string. And that will result in character for character printing.

Why not save the whole file at once?

import urllib
webf = urllib.urlopen('http://stackoverflow.com/questions/32971752/python-read-file-from-web-site-url')
txt = webf.read()

f = open('destination.txt', 'w+')
f.write(txt)
f.close()

If you really want to loop over the file line for line use txt = webf.readlines() and iterate over that.

answered Oct 06 '15 at 14:00

Noxeus

567
4
17

module 'urllib' has no attribute 'urlopen' – Raimundo Baravaglio Sep 28 '21 at 13:17
I think I wrote this in Python version 2. See here : https://stackoverflow.com/questions/25863101/python-urllib-urlopen-not-working – Noxeus Sep 29 '21 at 14:45

score 0 · Answer 3 · answered Oct 06 '15 at 14:02

If you're just trying to save a remote file to your local server as part of a python script, you could use the PycURL library to download and save it without parsing it. More info here - http://pycurl.sourceforge.net

Alternatively, if you want to read and then write the output, I think you've just got the methods out of sequence. Try the following:

# Assign the open file to a variable
webFile = urllib.urlopen(currURL)

# Read the file contents to a variable
file_contents = webFile.read()
print(file_contents)

> This will be the file contents

# Then write to a new local file
f = open('local file.txt', 'w')
f.write(file_contents)

If neither applies, please update the question to clarify.

score 0 · Answer 4 · answered Jun 20 '21 at 09:57

You can directly download the file and save it using a name that you prefer. After that, you can read the file and later you can delete it if you don't need the file anymore.

!pip install wget

import wget 
url = "https://raw.githubusercontent.com/apache/commons-validator/master/src/example/org/apache/commons/validator/example/ValidateExample.java" 
wget.download(url, 'myFile.java')

python read file from a web URL

4 Answers4

Linked