0

I have around 20 xlsx files that I would like to append using python. I can easily do that with pandas, the problem is that in the first column, I have hyperlinks and when I use pandas to append my xlsx files, I lose the hyperlink and get only the text in the column. Here is the code using pandas.

excels = [pd.ExcelFile(name) for name in files]
frames = [x.parse(x.sheet_names[0], header=None,index_col=None) for x in excels]
frames[1:] = [df[1:] for df in frames[1:]]
combined = pd.concat(frames)
combined.to_excel("c.xlsx", header=False, index=False)

Is there any way that I can append my files while retaining the hyperlinks? Is there a particular library that can do this?

Alex
  • 23
  • 4
  • Maybe [this](https://stackoverflow.com/a/42264209/9710458) could be useful for keeping the hyperlinks. – matkv Oct 01 '19 at 16:51

1 Answers1

0

It depends on how the hyperlinks are written in the original Excel files and on the Excel writer you use. read_excel will return the display text, e.g. if you have a hyperlink to https://www.google.com and the diplay text is just google, then there's no way to retain the link with pandas as you'll have just google in your dataframe.

If no separate display name is given (or the display name is identical with the hyperlink) and you use xlsxwriter (engine='xlsxwriter'), then the output of to_excel is automatically converted to hyperlinks (because it starts with 'http://' or any other scheme) (as of xlsxwriter version 1.1.5).

If you know that all your hyperlinks are 'http://' links with no authority and the display name (if different from the link) is just the url path, then you can prepend the 'http://' suffix and you'll get hyperlinks in the Excel file:

combined.iloc[combined[~combined.iloc[:,0].str.startswith('http')].index,0] = 'http://' + combined.iloc[combined[~combined.iloc[:,0].str.startswith('http')].index,0]
combined.to_excel("c.xlsx", header=False, index=False, engine='xlsxwriter')


A universal solution without pandas using openpyxl is shown in this answer to the same SO question where you took the pandas solution from. In order to copy hyperlinks too, you'll just have to add the following lines to the copySheet function:
        if cell.hyperlink is not None:
            newCell.hyperlink = copy(cell.hyperlink)
Stef
  • 28,728
  • 2
  • 24
  • 52
  • Yes, the display text is not the link but another name. Which gives only that display text and no link in my data frame, as you said. So with pandas it won't work, do you know if there is any other library that could do the job? – Alex Oct 02 '19 at 15:22
  • see my update "A universal ..." at the end of my answer. Tested with openpyxl version 3.0.0. – Stef Oct 02 '19 at 15:26