re.findall
returns a list of found strings. So, imgUrl
is a list.
You can't write
a list of strings to a file, only a string. Hence the error message.
If you want to write out the string representation of the list (which is easy, but unlikely to be useful), you can do this:
outfile.write(str(imgUrl))
If you want to write just the first URL, which is a string, you can:
outfile.write(imgUrl[0])
If you want to write all of the URLs, one on each line:
for url in imgUrl:
outfile.write(url + '\n')
Or, since it's HTML and the whitespace doesn't matter, you can write them all run together:
outfile.write(''.join(imgUrl))
You then have a second problem. For some reason, you've opened the file in binary mode. I don't know why you're doing this, but if you do, you can only write bytes
to the file, not strings. But you don't have a list of bytes
, you have have a list of strings. So, you need to encode
those strings into bytes. For example:
for url in imgUrl:
outfile.write(url.encode('utf-8') + b'\n')
Or—much better—just don't open the file in binary mode:
outfile = open('abc.htm', 'w')
If you want to specify an explicit encoding, you can still do that without using binary mode:
outfile = open('abc.htm', 'w', encoding='utf-8')
You may also have a third problem. From your comments, it appears that imgUrl[0]
gives you an IndexError
. That means that it's empty. Which means your regex is not actually finding any URLs to write in the first place. In that case, you obviously can't successfully write them out (unless you're expecting an empty file).
And the reason (or at least a reason) the regex is not finding anything is that you're not actually searching the downloaded HTML (which you've stored in image
) but the URL to that HTML (which you've stored in url
):
imgUrl = re.findall('<img src="(.*)" />', url)
… and obviously there are no matches for your regexp in the string 'http://www.techradar.com/news/internet/web/12-best-places-to-get-free-images-for-your-site-624818'
.