1

Firstly, there are plenty of related questions about extracting GIFs from a URL, but majority of them are in different languages to Python. Secondly, a google search provides many examples of how to do this using requests and a parser, like lxml or beautifulsoup. However, my problem is specific to this URL I think, and I cannot quite figure out why the image in question does not have a specific url attached to it ( http://cactus.nci.nih.gov/chemical/structure/3-Methylamino-1-%28thien-2-yl%29-propane-1-ol/image)

This is what I have tried

molecule_name = "3-Methylamino-1-(thien-2-yl)-propane-1-ol"
molecule = urllib.pathname2url(molecule_name)
response = requests.get("http://cactus.nci.nih.gov/chemical/structure/"+ molecule+"/image")
response.encoding = 'ISO-8859-1'
print type(response.content)

and I just get back a string that says GIF87au. I know it is something to do with GIF being in binary etc. But I cant quite work out how to donwload that GIF file in that particular page using the script.

Furthermore, if I do manage to download the GIF file, what are the best modules to use, to make tables (csv or excel style) with GIF files embedded in the last column for example?

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
user1998510
  • 75
  • 1
  • 9
  • 2
    What do you mean, "why it doesn't have a specific url attached to it"? It does have a URL, you've used it in your script. – Daniel Roseman Nov 11 '15 at 09:43
  • I figured it would have a image.gif addition to the end if it was to point to the exact GIF image and not to the page containing the gif image – user1998510 Nov 11 '15 at 10:32
  • 1
    Why? A file can be called anything you like; as long as it is transferred with the correct mime type of `image/gif`, which it is, the browser will know what to do with it. – Daniel Roseman Nov 11 '15 at 10:49
  • Ah, ok, that makes sense. I am still having trouble with downloading the file via the script above though, with regard to, the "response.text" option gives me a the image in text format, but when I save this to a file and try and open it with the browser, it does not work. Comparing it to the gif image manually downloaded from the computer, I can see that it appears to be an encoding issue. Any ideas? – user1998510 Nov 11 '15 at 11:27
  • Don't use `response.text`: that's for, well, text. Use `response.content`, see [the request docs](http://requests.readthedocs.org/en/latest/user/quickstart/#binary-response-content). – Daniel Roseman Nov 11 '15 at 11:28
  • I initially used response.content in the script, but for that I just get a string output "GIF87aú" . But when I output to a file, and open it in notepad, I find that both files are EXACTLY the same, but one opens properly, the other doesnt ( on both internet explorer and Windows photo viewer). – user1998510 Nov 11 '15 at 11:35

1 Answers1

1

As far as I can tell your code is working for me.

molecule_name = "3-Methylamino-1-(thien-2-yl)-propane-1-ol"
molecule = urllib.pathname2url(molecule_name)
response = requests.get("http://cactus.nci.nih.gov/chemical/structure/"+molecule+"/image")
response.encoding = 'ISO-8859-1'
print len(response.content)

It outputs "1080".

As for second task in hand ... putting it into document. I would use xlsxwriter like this:

import xlsxwriter


# Create an new Excel file and add a worksheet.
workbook = xlsxwriter.Workbook('molecules.xlsx')
worksheet = workbook.add_worksheet()

# Input data
worksheet.write(0, 0, "My molecule") # A1 == 0, 0
worksheet.insert_image('B1', 'molecule1234.png')

workbook.close()

See http://xlsxwriter.readthedocs.org/index.html

You will have to convert that .gif into .png, because as of now xlsxwriter does not support gifs (as jmcnamara pointed out). Here you can look how to do that using PIL - How to change gif file to png file using python pil.

You can display the gif using many various methods. I would just save it to file and used some other software. If you want to view them programmatically, you can use for instance Tkinter as used here Play Animations in GIF with Tkinter.

Community
  • 1
  • 1
  • Thanks for your prompt answer. You are right, the script does work and I do get 1080 when I do len(response.content). However, I am struggling to understand what that means? What code do I need to add so that I can actually get a hold of the GIF file on my computer? Is there any module I can use to actually view the GIF file in python as well? Would PIL(Image) do the job? How would you work this into the script? When I try to use PIL it says, the content is a string, not an image? Thank you for introducing me to xlswriter. That is going to save my life:). – user1998510 Nov 11 '15 at 10:35
  • 1
    Note, XlsxWriter doesn't currently support GIF files so the image will have to be converted to PNG or JPEG. Apart from that it is possible to insert images from a url almost directly in XlsxWriter: [worksheet.insert_image()](http://xlsxwriter.readthedocs.org/worksheet.html#worksheet-insert-image). – jmcnamara Nov 11 '15 at 11:12
  • Thank you for the additional information @jmcnamara . I am still stuck with getting the GIF image to display now. I can download it with my original script and save it as a GIF file, but the problem starts with opening it with PIL. I always get an error " "IOError: image file is truncated (6 bytes not processed)" – user1998510 Nov 11 '15 at 13:47
  • I have edited in some of your points. Thank you @jmcnamara for pointing out that gif is not supported by xlsxwriter. – Jan Skácel Nov 12 '15 at 13:24