2

I found some Python2 code to extract images from Excel files.

I have a very fundamental question: Where shall I specify the path of my target excel file?

Or does it only work with an active opened Excel file?

import win32com.client       # Need pywin32 from pip
from PIL import ImageGrab    # Need PIL as well
import os

excel = win32com.client.Dispatch("Excel.Application")
workbook = excel.ActiveWorkbook

wb_folder = workbook.Path
wb_name = workbook.Name
wb_path = os.path.join(wb_folder, wb_name)

#print "Extracting images from %s" % wb_path
print("Extracting images from", wb_path)

image_no = 0

for sheet in workbook.Worksheets:
    for n, shape in enumerate(sheet.Shapes):
        if shape.Name.startswith("Picture"):
            # Some debug output for console
            image_no += 1
            print("---- Image No. %07i ----", image_no)

            # Sequence number the pictures, if there's more than one
            num = "" if n == 0 else "_%03i" % n

            filename = sheet.Name + num + ".jpg"
            file_path = os.path.join (wb_folder, filename)

            #print "Saving as %s" % file_path    # Debug output
            print('Saving as ', file_path)

            shape.Copy() # Copies from Excel to Windows clipboard

            # Use PIL (python imaging library) to save from Windows clipboard
            # to a file
            image = ImageGrab.grabclipboard()
            image.save(file_path,'jpeg')
John
  • 1,779
  • 3
  • 25
  • 53

3 Answers3

10

You can grab images from existing Excel file like this:

from PIL import ImageGrab
import win32com.client as win32

excel = win32.gencache.EnsureDispatch('Excel.Application')
workbook = excel.Workbooks.Open(r'C:\Users\file.xlsx')

for sheet in workbook.Worksheets:
    for i, shape in enumerate(sheet.Shapes):
        if shape.Name.startswith('Picture'):  # or try 'Image'
            shape.Copy()
            image = ImageGrab.grabclipboard()
            image.save('{}.jpg'.format(i+1), 'jpeg')
Jean-François Fabre
  • 137,073
  • 23
  • 153
  • 219
Alderven
  • 7,569
  • 5
  • 26
  • 38
  • It seems there are many images names are not starting 'Picture'. I guess there could be better control as to which images to extract if we'd examine the shape.names closely. Thanks a lot for the hacking. – John Mar 05 '19 at 02:50
9

An xlsx file is actually a zip file. You can directly get the images from the xl/media subfolder. You can do this in python using the ZipFile class. You don't need to have MS Excel or even run in Windows!

Lukman Salim
  • 91
  • 1
  • 2
1

Filepath and filename is defined in the variables here:

wb_folder = workbook.Path
wb_name = workbook.Name
wb_path = os.path.join(wb_folder, wb_name)

In this particular case, it calls the active workbook at the line prior:

workbook = excel.ActiveWorkbook

But you should theoretically be able to specify path using the wb_folder and wb_name variables, as long as you load the file on the excel module (Python: Open Excel Workbook using Win32 COM Api).

Mike
  • 733
  • 7
  • 23