3

I am currently trying to read a txt file from a website.

My script so far is:

webFile = urllib.urlopen(currURL)

This way, I can work with the file. However, when I try to store the file (in webFile), I only get a link to the socket. Another solution I tried was to use read()

webFile = urllib.urlopen(currURL).read()

However this seems to remove the formating (\n, \t etc) are removed.

If I open the file like this:

 webFile = urllib.urlopen(currURL)

I can read it line by line:

for line in webFile:
    print line

This will should result in:

"this" 
"is" 
"a"
"textfile"

But I get:

't'
'h'
'i'
...

I wish to get the file on my computer, but maintain the format at the same time.

Darshan Chaudhary
  • 2,093
  • 3
  • 23
  • 42
mat
  • 77
  • 1
  • 1
  • 6
  • 1
    http://stackoverflow.com/questions/22676/how-do-i-download-a-file-over-http-using-python. Just take webFile and write it to a file. – postelrich Oct 06 '15 at 13:56
  • is there no way of doing it, without hving to first write it to a local file? – mat Oct 06 '15 at 13:59

4 Answers4

8

You should use readlines() to read entire line:

response = urllib.urlopen(currURL)
lines = response.readlines()
for line in lines:
    .
    .

But, i strongly recommend you to use requests library. Link here http://docs.python-requests.org/en/latest/

1

This is because you iterate over a string. And that will result in character for character printing.

Why not save the whole file at once?

import urllib
webf = urllib.urlopen('http://stackoverflow.com/questions/32971752/python-read-file-from-web-site-url')
txt = webf.read()

f = open('destination.txt', 'w+')
f.write(txt)
f.close()

If you really want to loop over the file line for line use txt = webf.readlines() and iterate over that.

Noxeus
  • 567
  • 4
  • 17
0

If you're just trying to save a remote file to your local server as part of a python script, you could use the PycURL library to download and save it without parsing it. More info here - http://pycurl.sourceforge.net


Alternatively, if you want to read and then write the output, I think you've just got the methods out of sequence. Try the following:

# Assign the open file to a variable
webFile = urllib.urlopen(currURL)

# Read the file contents to a variable
file_contents = webFile.read()
print(file_contents)

> This will be the file contents

# Then write to a new local file
f = open('local file.txt', 'w')
f.write(file_contents)

If neither applies, please update the question to clarify.

Phil Sheard
  • 2,102
  • 1
  • 17
  • 38
0

You can directly download the file and save it using a name that you prefer. After that, you can read the file and later you can delete it if you don't need the file anymore.

!pip install wget

import wget 
url = "https://raw.githubusercontent.com/apache/commons-validator/master/src/example/org/apache/commons/validator/example/ValidateExample.java" 
wget.download(url, 'myFile.java')
Udith Indrakantha
  • 860
  • 11
  • 17