0

I'm trying to get images from website but getting an error.

here is the code:

url = 'http://www.techradar.com/news/internet/web/12-best-places-to-get-free-images-for-your-site-624818'
image = urlopen(url).read()
patFinderImage = re.compile('.jpg')
imgUrl = re.findall('<img src="(.*)" />', url)
outfile = open('abc.htm', 'wb')
outfile.write(imgUrl)
outfile.close

error:

Traceback (most recent call last):
  File "C:\Users\joh\workspace\new2\newnewurl.py", line 14, in <module>
    outfile.write(imgUrl)
TypeError: 'list' does not support the buffer interface
  • `re.findall` returns a list, how do you want it to be printed to the file, one result per line, with a delimiter or just all results concatenated directly one after another? –  Feb 16 '14 at 01:10
  • actually i want all images yes one result per line with their names. like image name is /link/baloon.jpg so it must store with its name. – user3299370 Feb 16 '14 at 01:22
  • @user3299370: Put that information into the question. Ideally, show some (very small) sample input and the expected output. – abarnert Feb 16 '14 at 01:24
  • As a side note: `outfile.close` doesn't do anything; you can to _call_ the `close` method, not just refer to it, as in `outfile.close()`. Or, better, use a `with` statement instead of an explicit `close` call. – abarnert Feb 16 '14 at 01:27
  • Also you might want to search `image` instead of `url`... –  Feb 16 '14 at 01:39
  • right now I'm just focusing on this error later will compile with the other part of code. consider your suggestions. – user3299370 Feb 16 '14 at 01:39
  • yes Nabla i need images only – user3299370 Feb 16 '14 at 01:44
  • @user3299370: My answer explains why you get this error. There are three different problems all getting in the way. If you don't want to actually fix those problems, or understand them, and just want to eliminate this error, that's easy: just don't ever call `outfile.write` and it won't happen… – abarnert Feb 16 '14 at 01:50

1 Answers1

0

re.findall returns a list of found strings. So, imgUrl is a list.

You can't write a list of strings to a file, only a string. Hence the error message.

If you want to write out the string representation of the list (which is easy, but unlikely to be useful), you can do this:

outfile.write(str(imgUrl))

If you want to write just the first URL, which is a string, you can:

outfile.write(imgUrl[0])

If you want to write all of the URLs, one on each line:

for url in imgUrl:
    outfile.write(url + '\n')

Or, since it's HTML and the whitespace doesn't matter, you can write them all run together:

outfile.write(''.join(imgUrl))

You then have a second problem. For some reason, you've opened the file in binary mode. I don't know why you're doing this, but if you do, you can only write bytes to the file, not strings. But you don't have a list of bytes, you have have a list of strings. So, you need to encode those strings into bytes. For example:

for url in imgUrl:
    outfile.write(url.encode('utf-8') + b'\n')

Or—much better—just don't open the file in binary mode:

outfile = open('abc.htm', 'w')

If you want to specify an explicit encoding, you can still do that without using binary mode:

outfile = open('abc.htm', 'w', encoding='utf-8')

You may also have a third problem. From your comments, it appears that imgUrl[0] gives you an IndexError. That means that it's empty. Which means your regex is not actually finding any URLs to write in the first place. In that case, you obviously can't successfully write them out (unless you're expecting an empty file).

And the reason (or at least a reason) the regex is not finding anything is that you're not actually searching the downloaded HTML (which you've stored in image) but the URL to that HTML (which you've stored in url):

imgUrl = re.findall('<img src="(.*)" />', url)

… and obviously there are no matches for your regexp in the string 'http://www.techradar.com/news/internet/web/12-best-places-to-get-free-images-for-your-site-624818'.

abarnert
  • 354,177
  • 51
  • 601
  • 671
  • Traceback (most recent call last): File "C:\Users\joh\workspace\new2\newnewurl.py", line 14, in outfile.write(str(imgUrl)) TypeError: 'str' does not support the buffer interface ____________________________________________________________ Traceback (most recent call last): File "C:\Users\joh\workspace\new2\newnewurl.py", line 14, in outfile.write(imgUrl[0]) IndexError: list index out of range – user3299370 Feb 16 '14 at 01:19
  • Traceback (most recent call last): File "C:\Users\joh\workspace\new2\newnewurl.py", line 14, in outfile.write(imgUrl + '\n') TypeError: can only concatenate list (not "str") to list ___________________________________________________________________ Traceback (most recent call last): File "C:\Users\joh\workspace\new2\newnewurl.py", line 14, in outfile.write(''.join(imgUrl)) TypeError: 'str' does not support the buffer interface – user3299370 Feb 16 '14 at 01:20
  • the above are the errors generates according to your suggestions. respectively. – user3299370 Feb 16 '14 at 01:21
  • @user3299370: Do not try to paste tracebacks into comments here; it makes them unreadable. But I can guess what the problem is; I'll edit to explain. – abarnert Feb 16 '14 at 01:21
  • 1
    @abarnert Your last paragraph: Actually OP is simply searching the wrong string `url` instead of `image`. –  Feb 16 '14 at 01:38
  • @Nabla: Ah, good catch; thanks, I'll edit the answer. No guarantee that this is the _only_ problem, but it obviously won't work without that fix. :) – abarnert Feb 16 '14 at 01:46