17

This may seem to be a useless feature but it would be very helpful for me. I would like to save the output I get inside Canopy IDE. I would not think this is specific to Canopy but for the sake of clarity that is what I use. For example, my console Out[2] is what I would want from this:

enter image description here

I think that the formatting is quite nice and to reproduce this each time instead of just saving the output would be a waste of time. So my question is, how can I get a handle on this figure? Ideally the implimentation would be similar to standard methods, such that it could be done like this:

from matplotlib.backends.backend_pdf import PdfPages

pp = PdfPages('Output.pdf')
fig = plt.figure() 
ax = fig.add_subplot(1, 1, 1)
df.plot(how='table')
pp.savefig()
pp.close()

NOTE: I realize that a very similar question has been asked before ( How to save the Pandas dataframe/series data as a figure? ) but it never received an answer and I think I have stated the question more clearly.

Community
  • 1
  • 1
Keith
  • 4,646
  • 7
  • 43
  • 72
  • If you are willing to start over http://stackoverflow.com/questions/8524401/how-can-i-place-a-table-on-a-plot-in-matplotlib – Keith Jun 04 '15 at 16:21
  • 2
    So what's wring with the output of `DataFrame.to_html()`, which allows you to scrape the cell contents with some fairly standard HTML analysis using something like Beautiful Soup? Would you like an answer showing how? You say you want to access the cell content, but you also say you want a PDF. These two requirements would appear to conflict – holdenweb Jun 05 '15 at 14:32
  • I'm a bit confused as to what you want for your bounty and don't want to just offer an answer that doesn't get you nearer. You're obviously aware of `to_html` (and `to_latex` given the link you present) options for a `DataFrame`. What does that not give you? You can embed the latex into a matplotlib plot. Do you want to know how to embed the HTML into a pdf? – J Richard Snape Jun 09 '15 at 19:30
  • @Keith I had a guess what you might want to do and added an answer anyway - let me know if it fits what you wanted. It approaches the problem in a different way, not using matplotlib's pdf backend as the pdf rendering solution. – J Richard Snape Jun 11 '15 at 13:31

3 Answers3

6

Here is a somewhat hackish solution but it gets the job done. You wanted a .pdf but you get a bonus .png. :)

import numpy as np
import pandas as pd
from matplotlib.backends.backend_pdf import PdfPages
import matplotlib.pyplot as plt

from PySide.QtGui import QImage
from PySide.QtGui import QPainter
from PySide.QtCore import QSize
from PySide.QtWebKit import QWebPage

arrays = [np.hstack([ ['one']*3, ['two']*3]), ['Dog', 'Bird', 'Cat']*2]
columns = pd.MultiIndex.from_arrays(arrays, names=['foo', 'bar'])
df =pd.DataFrame(np.zeros((3,6)),columns=columns,index=pd.date_range('20000103',periods=3))

h = "<!DOCTYPE html> <html> <body> <p> " + df.to_html() + " </p> </body> </html>";
page = QWebPage()
page.setViewportSize(QSize(5000,5000))

frame = page.mainFrame()
frame.setHtml(h, "text/html")

img = QImage(1000,700, QImage.Format(5))
painter = QPainter(img)
frame.render(painter)
painter.end()
a = img.save("html.png")

pp = PdfPages('html.pdf')
fig = plt.figure(figsize=(8,6),dpi=1080) 
ax = fig.add_subplot(1, 1, 1)
img2 = plt.imread("html.png")
plt.axis('off')
ax.imshow(img2)
pp.savefig()
pp.close()

Edits welcome.

Keith
  • 4,646
  • 7
  • 43
  • 72
4

It is, I believe, an HTML table that your IDE is rendering. This is what ipython notebook does.

You can get a handle to it thusly:

from IPython.display import HTML
import pandas as pd
data = pd.DataFrame({'spam':['ham','green','five',0,'kitties'],
                     'eggs':[0,1,2,3,4]})
h = HTML(data.to_html())
h

and save to an HTML file:

my_file = open('some_file.html', 'w')
my_file.write(h.data)
my_file.close()
  • OK great that gets me half way but I have zero HTML experience. I would like to get the HTML object into the pdf file I am saving my plots too. There is a toy example in my original question. – Keith Jul 07 '14 at 07:48
  • As you said ' So my question is, how can I get a handle on this figure?', that is what I answered. Does the table _need_ to be saved as a .pdf? I've updated the answer to save the html object to a file. – Laurence Billingham Jul 07 '14 at 08:58
  • @user262536 I don't know how to convert the HTML to a .pdf off the top of my head. This SO question might help: (http://stackoverflow.com/questions/4659058/how-to-save-html-elements-to-jpeg-png-or-pdf-using-python). Another method might be the `pandas.DataFrame.to_latex()` method and compile, along with the figure, with pdflatex or similar. I've never tried to do that either though. – Laurence Billingham Jul 07 '14 at 09:07
  • Sorry, I meant how can I get a handle in relation to the matplotlib classes. As in how can I get that table to be of a similar output to what is returned from matplotlib.pyplot.imread or matplotlib.pyplot.plot. I should have been more clear. – Keith Jul 07 '14 at 09:14
2

I think what is needed here is a consistent way of outputting a table to a pdf file amongst graphs output to pdf.

My first thought is not to use the matplotlib backend i.e.

from matplotlib.backends.backend_pdf import PdfPages

because it seemed somewhat limited in formatting options and leaned towards formatting the table as an image (thus rendering the text of the table in a non-selectable format)

If you want to mix dataframe output and matplotlib plots in a pdf without using the matplotlib pdf backend, I can think of two ways.

  1. Generate your pdf of matplotlib figures as before and then insert pages containing the dataframe table afterwards. I view this as a difficult option.
  2. Use a different library to generate the pdf. I illustrate one option to do this below.

First, install xhtml2pdf library. This seems a little patchily supported, but is active on Github and has some basic usage documentation here. You can install it via pip i.e. pip install xhtml2pdf

Once you've done that, here is a barebones example embedding a matplotlib figure, then the table (all text selectable), then another figure. You can play around with CSS etc to alter the formatting to your exact specifications, but I think this fulfils the brief:

from xhtml2pdf import pisa             # this is the module that will do the work
import numpy as np
import pandas as pd
from matplotlib.backends.backend_pdf import PdfPages
import matplotlib.pyplot as plt

# Utility function
def convertHtmlToPdf(sourceHtml, outputFilename):
    # open output file for writing (truncated binary)
    resultFile = open(outputFilename, "w+b")

    # convert HTML to PDF
    pisaStatus = pisa.CreatePDF(
            sourceHtml,                # the HTML to convert
            dest=resultFile,           # file handle to recieve result
            path='.')                  # this path is needed so relative paths for 
                                       # temporary image sources work

    # close output file
    resultFile.close()                 # close output file

    # return True on success and False on errors
    return pisaStatus.err

# Main program
if __name__=='__main__':   
 
    arrays = [np.hstack([ ['one']*3, ['two']*3]), ['Dog', 'Bird', 'Cat']*2]
    columns = pd.MultiIndex.from_arrays(arrays, names=['foo', 'bar'])
    df = pd.DataFrame(np.zeros((3,6)),columns=columns,index=pd.date_range('20000103',periods=3))

    # Define your data
    sourceHtml = '<html><head>'         
    # add some table CSS in head
    sourceHtml += '''<style>
                     table, td, th {
                           border-style: double;
                           border-width: 3px;
                     }

                     td,th {
                           padding: 5px;
                     }
                     </style>'''
    sourceHtml += '</head><body>'
    #Add a matplotlib figure(s)
    plt.plot(range(20))
    plt.savefig('tmp1.jpg')
    sourceHtml += '\n<p><img src="tmp1.jpg"></p>'
    
    # Add the dataframe
    sourceHtml += '\n<p>' + df.to_html() + '</p>'
    
    #Add another matplotlib figure(s)
    plt.plot(range(70,100))
    plt.savefig('tmp2.jpg')
    sourceHtml += '\n<p><img src="tmp2.jpg"></p>'
    
    sourceHtml += '</body></html>'
    outputFilename = 'test.pdf'
    
    convertHtmlToPdf(sourceHtml, outputFilename)

Note There seems to be a bug in xhtml2pdf at the time of writing which means that some CSS is not respected. Particularly pertinent to this question is that it seems impossible to get double borders around the table


EDIT

In response comments, it became obvious that some users (well, at least @Keith who both answered and awarded a bounty!) want the table selectable, but definitely on a matplotlib axis. This is somewhat more in keeping with the original method. Hence - here is a method using the pdf backend for matplotlib and matplotlib objects only. I do not think the table looks as good - in particular the display of hierarchical column headers, but that's a matter of choice, I guess. I'm indebted to this answer and comments for the way to format axes for table display.

import numpy as np
import pandas as pd
from matplotlib.backends.backend_pdf import PdfPages
import matplotlib.pyplot as plt

# Main program
if __name__=='__main__':   
    pp = PdfPages('Output.pdf')
    arrays = [np.hstack([ ['one']*3, ['two']*3]), ['Dog', 'Bird', 'Cat']*2]
    columns = pd.MultiIndex.from_arrays(arrays, names=['foo', 'bar'])
    df =pd.DataFrame(np.zeros((3,6)),columns=columns,index=pd.date_range('20000103',periods=3))

    plt.plot(range(20))
    pp.savefig()
    plt.close()

    # Calculate some sizes for formatting - constants are arbitrary - play around
    nrows, ncols = len(df)+1, len(df.columns) + 10
    hcell, wcell = 0.3, 1.
    hpad, wpad = 0, 0   
    
    #put the table on a correctly sized figure    
    fig=plt.figure(figsize=(ncols*wcell+wpad, nrows*hcell+hpad))
    plt.gca().axis('off')
    matplotlib_tab = pd.tools.plotting.table(plt.gca(),df, loc='center')    
    pp.savefig()
    plt.close()

    #Add another matplotlib figure(s)
    plt.plot(range(70,100))
    pp.savefig()
    plt.close()
  
    pp.close()
Community
  • 1
  • 1
J Richard Snape
  • 20,116
  • 5
  • 51
  • 79
  • Thanks, this gets me much farther. The above image option gets the formatting but no selection and this gets the selection but no formatting. I'll give you the bounty but I am going to try and see if I can get a more reasonable formatting. – Keith Jun 11 '15 at 17:50
  • Sure, thanks. Now I know it's what you want, I'll have a look at bringing some CSS in to style the table, the docs imply that's possible. – J Richard Snape Jun 11 '15 at 19:21
  • PUt in some CSS - unfortunately it looks like it ignores the `border-style: double` directive, but the `border-width` and padding seem to be respected and make the layout somewhat nicer. I'm sure more can be done with CSS if you really need a specific layout – J Richard Snape Jun 11 '15 at 20:11
  • I don't think I need a specific layout, I just think the one pictured above that come from ipython looks good. I am sure many are equally good but originally I thought that there would be some way to get this directly. Anyway, your above code does not run for me. I get >> CSSParseError: Selector name or qualifier expected:: (u'', u'\n

    – Keith Jun 12 '15 at 17:59
  • Sorry, that's a typo. The last > of – J Richard Snape Jun 12 '15 at 18:40
  • Yes that is much nicer. I think I can get there by playing with the style settings. I am still interested in trying to find a way to put the table in an matplotlib axis. This would be a nice patch for Pandas – Keith Jun 12 '15 at 18:54
  • I see what you mean. I'm not sure how feasible it is to do that and have selectability. I have a feeling that at a design level, the axis renders as a picture. If I have time, I'll return to this idea and see if there is more that could be done. – J Richard Snape Jun 12 '15 at 19:25
  • I am optimistic it is possible since you can select the title, ylabel, xticklabels... ect. Perhaps doing it in a multilayered way where the picture is a jpg of the empty table and the values are added through the annotate function. It may just become a nightmare though. – Keith Jun 12 '15 at 19:36
  • I have given it another try and could not get much farther. I guess @wes-mckinney is the only one around with a good enough understanding of the code base. – Keith Jun 16 '15 at 17:30
  • I had a play too. I think the problem with formatting in the pdf is the fault of html2pdf ignoring (some) CSS, rather than Wes's problem in pandas. In terms of adding a table plot to a matplotlab figure, maybe he would be interested to do that. to_html code is [here](https://github.com/pydata/pandas/blob/master/pandas/core/frame.py#L1350). However - your reference to pandas codebase made me look at [this](https://github.com/pydata/pandas/blob/master/pandas/tools/plotting.py#L2440) which looks promising... If I get time I'll examine that... – J Richard Snape Jun 16 '15 at 18:27
  • In particular, this function [`table`](https://github.com/pydata/pandas/blob/master/pandas/tools/plotting.py#L3042) looks interesting - returns a `matplotlib.table` object - a search on which revealed [this question]( http://stackoverflow.com/questions/17232683/creating-tables-in-matplotlib) which I think, in a roundabout way, might get us to back to what you really want. I'll try to implement once I'm on a computer rather than phone. Let me know if you get there first. @Keith – J Richard Snape Jun 16 '15 at 18:48
  • @Keith just pinging you to let you know that I've added a new section with the table selectable and embedded on a matplotlib axis. I like a challenge :) – J Richard Snape Jun 16 '15 at 19:52