-1

I am having trouble with the Python Docx Library, I have scraped images from a website and I want to add them to docx but I cannot add the images to docx directly, I keep getting an error:

File "C:\Python27\lib\site-packages\docx\image\image.py", line 46, in from_file with open(path, 'rb') as f: IOError: [Errno 22] invalid mode ('rb') or filename: 'http://upsats.com/Content/Product/img/Product/Thumb/PCB2x8-.jpg'

This is my code:

import urllib
import requests
from bs4 import BeautifulSoup
from docx import Document
from docx.shared import Inches
import os


    document = Document()

    document.add_heading("Megatronics Items Full Search", 0)


    FullPage = ['New-Arrivals-2017-6', 'Big-Sales-click-here', 'Arduino-Development-boards',
                'Robotics-and-Copters', 'Breakout-Boards', 'RC-Wireless-communication', 'GSM,-GPS,-RFID,-Wifi',
                'Advance-Development-boards-and-starter-Kits', 'Sensors-and-IMU', 'Solenoid-valves,-Relays,--Switches',
                'Motors,-drivers,-wheels', 'Microcontrollers-and-Educational-items', 'Arduino-Shields',
                'Connectivity-Interfaces', 'Power-supplies,-Batteries-and-Chargers', 'Programmers-and-debuggers',
                'LCD,-LED,-Cameras', 'Discrete-components-IC', 'Science-Education-and-DIY', 'Consumer-Electronics-and-tools',
                'Mechanical-parts', '3D-Printing-and-CNC-machines', 'ATS', 'UPS', 'Internal-Battries-UPS',
                'External-Battries-UPS']

    urlp1 = "http://www.arduinopak.com/Prd.aspx?Cat_Name="
    URL = urlp1 + FullPage[0]

    for n in FullPage:
        URL = urlp1 + n
        page = urllib.urlopen(URL)
        bsObj = BeautifulSoup(page, "lxml")
        panel = bsObj.findAll("div", {"class": "panel"})

        for div in panel:
            titleList = div.find('div', attrs={'class': 'panel-heading'})
            imageList = div.find('div', attrs={'class': 'pro-image'})
            descList = div.find('div', attrs={'class': 'pro-desc'})

            r = requests.get("http://upsats.com/", stream=True)
            data = r.text

            for link in imageList.find_all('img'):
                image = link.get("src")
                image_name = os.path.split(image)[1]
                r2 = requests.get(image)
                with open(image_name, "wb") as f:
                    f.write(r2.content)

                print(titleList.get_text(separator=u' '))
                print(imageList.get_text(separator=u''))
                print(descList.get_text(separator=u' '))
                document.add_heading("%s \n" % titleList.get_text(separator=u' '))
                document.add_picture(image, width=Inches(1.5))
                document.add_paragraph("%s \n" % descList.get_text(separator=u' '))

    document.save('megapy.docx')

Not all of it but just the main part. Now, I am having problems copying the pictures that I downloaded, I want to copy it to docx. I do not know how to add the picture. How do I convert it? I think I have to format it but how do I do that?

All I know is the problem lies within this code:

document.add_picture(image, width=Inches(1.0))

How do I make this image show up in docx from the URL? What am I missing?

Oliver Queen
  • 25
  • 1
  • 9
  • Possible duplicate of [Add an image in a specific position in the document (.docx) with Python?](https://stackoverflow.com/questions/32932230/add-an-image-in-a-specific-position-in-the-document-docx-with-python) – Liam Aug 07 '17 at 11:36
  • Sorry but that is for positioning. I want to show the images in the docx file, I have downloaded the images from this url: www.arduinopak.com/ but I cannot get the pictures into the docx file. – Oliver Queen Aug 07 '17 at 13:20

1 Answers1

3

Update

I did a test with 10 images and I got a docx. When loading many I had an error at one place and I overwrote that by adding a try, except (see below). The resulting megapy.docx got 165 MB big and took about 10 minutes to create.

with open(image_name, "wb") as f:
    f.write(r2.content)

To:

image = io.BytesIO(r2.content)

And added:

try:
    document.add_picture(image, width=Inches(1.5))
except:
    pass

enter image description here


Use io library to create file-like ojects.

Example that works on python2&3:

import requests
import io
from docx import Document
from docx.shared import Inches

url = 'https://upload.wikimedia.org/wikipedia/commons/thumb/f/f3/Usain_Bolt_Rio_100m_final_2016k.jpg/200px-Usain_Bolt_Rio_100m_final_2016k.jpg'
response = requests.get(url, stream=True)
image = io.BytesIO(response.content)

document = Document()
document.add_picture(image, width=Inches(1.25))
document.save('demo.docx')

enter image description here

Anton vBR
  • 18,287
  • 5
  • 40
  • 46
  • Thanks so much. I still have some problem. document.add_picture(image, width=Inches(1.0)) `File "C:\Python27\lib\site-packages\docx\document.py", line 79, in add_picture return run.add_picture(image_path_or_stream, width, height)` – Oliver Queen Aug 07 '17 at 16:25
  • @AbbasKhan I wrote a small program once, let me just check how I did – Anton vBR Aug 07 '17 at 16:28
  • I am so sorry to disturb you, friend. I am new to this. Just trying to get my hands ready. – Oliver Queen Aug 07 '17 at 16:34
  • @AbbasKhan I updated my example and it works for me. Doesn't your error message say something else? – Anton vBR Aug 07 '17 at 16:36
  • `File "C:\Python27\lib\site-packages\docx\image\helpers.py", line 88, in _read_bytes raise UnexpectedEndOfFileError docx.image.exceptions.UnexpectedEndOfFileError` – Oliver Queen Aug 07 '17 at 17:01
  • ` r = requests.get("http://upsats.com/Content/Product/img/Product/Large/", stream=True) data = r.text soup = BeautifulSoup(data, 'lxml') image_name = os.path.split(image)[1] print(image_name) r2 = requests.get(image) image = io.BytesIO(r2.content)` – Oliver Queen Aug 07 '17 at 17:03
  • I have updated the question. Please check the code now. – Oliver Queen Aug 09 '17 at 17:29
  • @OliverQueen With small changes I succefully created a docx in Python 3. I used your exact code + my changes and **urllib.request.urlopen(URL)**. – Anton vBR Aug 09 '17 at 20:27
  • Well I have no special setup and your code is working for me. – Anton vBR Aug 10 '17 at 09:50
  • @OliverQueen What errros are you encountering? – Anton vBR Aug 10 '17 at 10:31
  • First, I have an image descriptor error followed by this: File "C:\Python27\lib\site-packages\docx\image\image.py", line 46, in from_file with open(path, 'rb') as f: IOError: [Errno 22] invalid mode ('rb') or filename: 'http://upsats.com/Content/Product/img/Product/Thumb/PCB2x8-.jpg' – Oliver Queen Aug 10 '17 at 15:01
  • I just saw the edit now... shit... sorry man, running it... OHHH im feeling excited! – Oliver Queen Aug 10 '17 at 15:07
  • Done. Oh man, thanks bro! Now I can access this when I go to China!!! Love you bro. Stay blessed! – Oliver Queen Aug 10 '17 at 18:02
  • @OliverQueen Glad you are happy! Good luck in China I guess :) – Anton vBR Aug 10 '17 at 18:03
  • You have saved this city! – Oliver Queen Aug 10 '17 at 18:26
  • @AntonvBR how could I adapt this to read a local file instead of using an url request response? Perhaps that might solve this issue https://github.com/python-openxml/python-docx/issues/187 – abu May 22 '21 at 08:01
  • I’m sorry but that won’t fix the issue with the local file. – Anton vBR May 22 '21 at 09:13