18

Suppose I have a DataFrame I want to export to a PDF. In the DataFrame I have the following columns: Code, Name, Price, Net, Sales. Every row is a Product.

I want to add to every product in that DataFrame an image which i could get using BeautifulSoup. Is there some way to add the image to the DataFrame? Not the link, just the image of the product.

Being more specific i want something like this:

enter image description here

Code:

import pandas as pd
df = pd.DataFrame([['A231', 'Book', 5, 3, 150], 
                   ['M441', 'Magic Staff', 10, 7, 200]],
                   columns = ['Code', 'Name', 'Price', 'Net', 'Sales')

#Suppose this are the links that contains the imagen i want to add to the DataFrame
images = ['Link 1','Link 2'] 
Snedecor
  • 689
  • 1
  • 6
  • 14

1 Answers1

32

You'll probably have to play a bit around with width and height attributes, but this should get you started. Basically, you're just converting the image/links to html, then using the df.to_html to display those tags. Note, it won't show if you're working in an IDE like PyCharm, Spyder, but as you can see below with my output, works fine through jupyter notebooks

import pandas as pd
from IPython.core.display import display,HTML

df = pd.DataFrame([['A231', 'Book', 5, 3, 150], 
                   ['M441', 'Magic Staff', 10, 7, 200]],
                   columns = ['Code', 'Name', 'Price', 'Net', 'Sales'])

# your images
images1 = ['https://vignette.wikia.nocookie.net/2007scape/images/7/7a/Mage%27s_book_detail.png/revision/latest?cb=20180310083825',
          'https://i.pinimg.com/originals/d9/5c/9b/d95c9ba809aa9dd4cb519a225af40f2b.png'] 


images2 = ['https://static3.srcdn.com/wordpress/wp-content/uploads/2020/07/Quidditch.jpg?q=50&fit=crop&w=960&h=500&dpr=1.5',
           'https://specials-images.forbesimg.com/imageserve/5e160edc9318b800069388e8/960x0.jpg?fit=scale']

df['imageUrls'] = images1
df['otherImageUrls'] = images2


# convert your links to html tags 
def path_to_image_html(path):
    return '<img src="'+ path + '" width="60" >'

pd.set_option('display.max_colwidth', None)

image_cols = ['imageUrls', 'otherImageUrls']  #<- define which columns will be used to convert to html

# Create the dictionariy to be passed as formatters
format_dict = {}
for image_col in image_cols:
    format_dict[image_col] = path_to_image_html


display(HTML(df.to_html(escape=False ,formatters=format_dict)))

Output

Then you have some options of what to do there to go to pdf.

You could save as html

df.to_html('test_html.html', escape=False, formatters=format_dict)

then simply use and html to pdf converter here, or use a library such as pdfkit or WeasyPrint. I'm not entirely familiar with those (I only used one of them once a long time ago), but here's a good link

Good luck.

chitown88
  • 27,527
  • 4
  • 30
  • 59
  • 1
    Thanks for this great answer, @chitown88, it was exactly what I needed. The code just needs a bit of updating. Change: `from IPython.core.display import HTML` into `from IPython.core.display import display, HTML` And: `HTML(df.to_html(escape=False ,formatters=dict(image=path_to_image_html)))` into `display(HTML(df.to_html(escape=False ,formatters=dict(image=path_to_image_html))))`. As shown [here](https://stackoverflow.com/questions/25698448/how-to-embed-html-into-ipython-output "title") – Rens Oct 31 '20 at 08:10
  • @chitown88 the formatters is missing while saving to html file. ```df.to_html('test_html.html', escape=False, formatters=dict(image=path_to_image_html)) ``` – kjsr7 Dec 30 '20 at 05:12
  • @chitown88 Also, the ```pd.set_option('display.max_colwidth', -1)``` provides the warning `:19: FutureWarning: Passing a negative integer is deprecated in version 1.0 and will not be supported in future version. Instead, use None to not limit the column width.`. Need to change it to ```pd.set_option('display.max_colwidth', None)``` – kjsr7 Dec 30 '20 at 05:18
  • How do you do apply the formatter to multiple columns? I have a dataframe with image urls in two columns and want to render those but `formatters=[ path_to_image_html("imageUrls"), path_to_image_html("otherImageUrls") ]` didn't work – rom May 16 '21 at 04:49
  • 1
    Good question. I’ll update this code when I get chance to sit down at my laptop in a few hours. – chitown88 May 16 '21 at 06:38
  • 1
    @rom, Ok I updated the code. The reason it didn't work for you is you need to use a dictionary for the formatters. – chitown88 May 16 '21 at 16:47
  • 1
    theres multiple ways to do this too. You simply could apply that function seperately to each of the columns, then wouldnt need to use formatters param – chitown88 May 16 '21 at 16:51
  • Thanks so much! I finally realized that the value of the dict is the reference to the formatting function: `{"columnA": format_function, "columnB": format_function}`. And by knowing this, I was able to put many columns into the dict, which ultimately allowed for multiple columns to show images. – rom May 16 '21 at 20:41
  • I don't think specifying the width in `''` is making any difference. – dzenilee Jun 09 '22 at 04:39
  • @dzenilee, it does make a difference if you are saving as html. – chitown88 Jun 09 '22 at 07:47
  • I tried this solution, but it does not work if the images are in local path, the case of a colab taking images from local content.. – Y.AL Feb 04 '23 at 15:16
  • Not I sure I understand what you mean. Are you saying the images you want in the dataframe are locally saved? – chitown88 Feb 05 '23 at 21:19