18

I have been looking at mostly the xlrd and openpyxl libraries for Excel file manipulation. However, xlrd currently does not support formatting_info=True for .xlsx files, so I can not use the xlrd hyperlink_map function. So I turned to openpyxl, but have also had no luck extracting a hyperlink from an excel file with it. Test code below (the test file contains a simple hyperlink to google with hyperlink text set to "test"):

import openpyxl

wb = openpyxl.load_workbook('testFile.xlsx')

ws = wb.get_sheet_by_name('Sheet1')

r = 0
c = 0

print ws.cell(row = r, column = c). value
print ws.cell(row = r, column = c). hyperlink
print ws.cell(row = r, column = c). hyperlink_rel_id

Output:

test

None

I guess openpyxl does not currently support formatting completely either? Is there some other library I can use to extract hyperlink information from Excel (.xlsx) files?

LucasS
  • 689
  • 3
  • 14
  • 24
  • Openpyxl's support for formatting is now pretty complete. The reason hyperlinks aren't so well supported is historical and also the way they are implemented in the Excel 2010 specification, which has lots of indirection. – Charlie Clark Jan 09 '15 at 14:26
  • Here's some more experiment with the limitation of OpenPyXl with hyperlinks. http://stackoverflow.com/questions/34964255/openpyxl-always-return-none-for-a-cell-with-hyperlink/34964292#34964292 – smwikipedia Jan 23 '16 at 14:05
  • There are multiple answers on here, but it's essential to have the "read_only" parameter set to False (or left blank since it's the default argument value in the constructor) for the cell.hyperlink object to be extracted when using the openpyxl Python package ... wb_source = openpyxl.load_workbook(filename, read_only=True) – JustBeingHelpful Mar 11 '21 at 19:50

8 Answers8

29

This is possible with openpyxl:

import openpyxl

wb = openpyxl.load_workbook('yourfile.xlsm')
ws = wb['Sheet1']
# This will fail if there is no hyperlink to target
print(ws.cell(row=2, column=1).hyperlink.target)
wordsforthewise
  • 13,746
  • 5
  • 87
  • 117
4

Starting from at least version openpyxl-2.4.0b1 this bug https://bitbucket.org/openpyxl/openpyxl/issue/152/hyperlink-returns-empty-string-instead-of was fixed. Now it's return for cell Hyperlink object:

hl_obj = ws.row(col).hyperlink  # getting Hyperlink object for Cell
#hl_obj = ws.cell(row = r, column = c).hyperlink This could be used as well.
if hl_obj:
    print(hl_obj.display)
    print(hl_obj.target)
    print(hl_obj.tooltip) # you can see it when hovering mouse on hyperlink in Excel
    print(hl_obj) # to see other stuff if you need
Hellohowdododo
  • 396
  • 3
  • 12
3

FYI, the problem with openpyxl is an actual bug.

And, yes, xlrd cannot read the hyperlink without formatting_info, which is currently not supported for xlsx.

Charlie Clark
  • 18,477
  • 4
  • 49
  • 55
alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
1

In my experience getting good .xlsx interaction requires moving to IronPython. This lets you work with the Common Language Runtime (clr) and interact directly with excel'

http://ironpython.net/

import clr
clr.AddReference("Microsoft.Office.Interop.Excel")
import Microsoft.Office.Interop.Excel as Excel
excel = Excel.ApplicationClass()

wb = excel.Workbooks.Open('testFile.xlsx')
ws = wb.Worksheets['Sheet1']

address = ws.Cells(row, col).Hyperlinks.Item(1).Address
battmatt
  • 34
  • 6
  • To use IronPython you need to install their distribution of Python, right? Never tried it because I've always used other distros like pythonxy, anaconda, and plain python installs. – wordsforthewise Feb 02 '21 at 03:21
1

A successful solution I've worked with is to install unoconv on the server and implement a method that invokes this command line tool via the subprocess module to convert the file from xlsx to xls since hyperlink_map.get() works with xls.

John Paul Hayes
  • 778
  • 6
  • 13
1

For direct manipulation of Excel files it's also worth looking at the excellent XlWings library.

Charlie Clark
  • 18,477
  • 4
  • 49
  • 55
1
import openpyxl

wb = openpyxl.load_workbook('yourfile.xlsx')
ws = wb['Sheet1']

try:
    print(ws.cell(row=2, column=1).hyperlink.target)

#This fail if their is no hyperlink
except:
    print(ws.cell(row=2, column=1).value)

In order to handle the exception 'message': "'NoneType' object has no attribute 'target'", we can use it in a try/except block. So even if there are no hyperlinks available in the given cell, it will print the content contained in the cell.

Spooky
  • 1,752
  • 21
  • 37
0

If instead of just .hyperlink, doing .hyperlink.target should work. I was getting a 'None' as well from using just ".hyperlink" on the cell object before that.