18

Is it possible to get a nicely formatted table from a pandas dataframe in ipython notebook when using nbconvert to latex & PDF?

The default seems to be just a left-aligned block of numbers in a shoddy looking font.

I would like to have something more like the html display of dataframes in the notebook, or a latex table. Saving and displaying a .png image of an HTML rendered dataframe would also be fine, but how exactly to do that has proved elusive.

Minimally, I would just like a simple centre-aligned table in a nice font.

I haven't had any luck with various attempts to use the .to_latex() method to get latex tables from pandas dataframes, either within the notebook or in nbconvert outputs. I've also tried (after reading ipython dev list discussions, and following the custom display logic notebook example) making a custom class with _repr_html_ and _repr_latex_ methods, returning the results of _to_html() and _to_latex(), respectively. I think a main problem with the nb conversion is that pdflatex isn't happy with either the {'s or the //'s in the dataframe to_latex() output. But I don't want to start fiddling around with that before checking I haven't missed something.

Thanks.

J Grif
  • 1,003
  • 2
  • 12
  • 16
  • You don't need to create custom class to add formater to existing classes : http://nbviewer.ipython.org/github/ipython/ipython/blob/master/examples/notebooks/Custom%20Display%20Logic.ipynb#Adding-IPython-display-support-to-existing-objects and, no panda have no way to make latex from table IIRC. – Matt Dec 19 '13 at 16:52
  • I did something similar here but I am not happy with the solution. http://stackoverflow.com/questions/24574976/save-the-out-table-of-a-pandas-dataframe-as-a-figure?lq=1 – Keith Jun 04 '15 at 16:38

3 Answers3

17

There is a simpler approach that is discussed in this Github issue. Basically, you have to add a _repr_latex_ method to the DataFrame class, a procedure that is documented from pandas in their official documentation.

I did this in a notebook like this:

import pandas as pd

pd.set_option('display.notebook_repr_html', True)

def _repr_latex_(self):
    return "\centering{%s}" % self.to_latex()

pd.DataFrame._repr_latex_ = _repr_latex_  # monkey patch pandas DataFrame

The following code:

d = {'one' : [1., 2., 3., 4.],
     'two' : [4., 3., 2., 1.]}
df = pd.DataFrame(d)
df

turns into an HTML table if evaluated live in the notebook, and it converts into a (centered) table in PDF format:

$ ipython nbconvert --to latex --post PDF notebook.ipynb
logc
  • 3,813
  • 1
  • 18
  • 29
  • That's handy. I have done something similar by defining a simple custom class that inherits the dataframe datatype and adds a similar _repr_latex_ method with formatting etc. Patching the dataframe class directly is more lightweight I guess. – J Grif Jun 12 '14 at 16:50
  • Hello, Nice work! I also added "escape=False" to the to_latex method, because I like to use LaTeX style strings as column names. – Ben K. Dec 05 '15 at 17:53
  • I submitted a [PR](https://github.com/pydata/pandas/pull/11778) which just got merged into the next pandas release (0.18) which fixes this issue. so starting with that version, the conversion should go smoothly. – Ben K. Dec 19 '15 at 17:47
  • 4
    I get `No module named PDF` – Ammar Alyousfi Dec 05 '18 at 16:06
  • 3
    This works, however, all content after the first table gets centered in my pdf. – Martin Thøgersen Feb 28 '20 at 14:03
  • 1
    Both links are broken. – Martin Thøgersen Feb 28 '20 at 14:07
  • Works for me. I would like to ask what would you do if the dataframe is styled, for example, if a row is highlighted? – ego2509 Mar 25 '21 at 15:18
  • @MartinThøgersen, Did you solve the issue with the text affected? – gustavovelascoh Jul 16 '21 at 13:02
  • Hello! That saved my life, just wondering, if anyone had a sulution to embed the latex commands \label and a \legend, so that in the latex file I could I also generate a list of figures and reference by the label automatically. Any thoughts of how to try that? – DTK Jan 16 '22 at 01:04
  • I have created this [post](https://stackoverflow.com/questions/70730280/add-latex-commands-label-and-caption-to-a-visual-studio-code-notebook-image-wh), to try to figure out a workaround for the \label and \legend. Please any ideas let me know Thanks – DTK Jan 16 '22 at 13:16
8

The simplest way available now is to display your dataframe as a markdown table. You may need to install tabulate for this.

In your code cell, when displaying dataframe, use following:

from IPython.display import Markdown, display
display(Markdown(df.to_markdown()))

Since it is a markdown table, nbconvert can easily translate this into latex.

mbh86
  • 6,078
  • 3
  • 18
  • 31
Pushkar Nimkar
  • 394
  • 3
  • 11
6

I wrote my own mako-based template scheme for this. I think it's actually quite an easy workflow if you commit to chugging through it for yourself once. After that, you begin to see that templating the metadata of your desired format so it can be factored out of the code (and doesn't represent a third-party dependence) is a very nice way to solve it.

Here is the workflow I came up with.

  1. Write the .mako template that accepts your dataframe as an argument (and possibly other args) and converts it to the TeX format you want (example below).

  2. Make a wrapper class (I call it to_tex) that makes the API you desire (e.g. so you can pass it your data objects and it handles the call to mako render commands internally).

  3. Within the wraper class, decide on how you want the output. Print the TeX code to the screen? Use a subprocess to actually compile it to a pdf?

In my case, I was working on generating preliminary results for a research paper and needed to format tables into a complicated double-sorted structure with nested column names, etc. Here's an example of what one of the tables looks like:

Example output from templated TeX tool

Here is the mako template for this (warning, gross):

<%page args="df, table_title, group_var, sort_var"/>
<%
"""
Template for country/industry two-panel double sorts TeX table.
Inputs: 
-------
df: pandas DataFrame
    Must be 17 x 12 and have rows and columns that positionally
    correspond to the entries of the table.

table_title: string
    String used for the title of the table.

group_var: string
    String naming the grouping variable for the horizontal sorts.
    Should be 'Country' or 'Industry'.

sort_var: string (raw)
    String naming the variable that is being sorted, e.g.
    "beta" or "ivol". Note that if you want the symbol to
    be rendered as a TeX symbol, then pass a raw Python
    string as the arg and include the needed TeX markup in
    the passed string. If the string isn't raw, some of the
    TeX markup might be interpreted as special characters.

Returns:
--------
When used with mako.template.Template.render, will produce
a raw TeX string that can be rendered into a PDF containing
the specified data.

Author:
-------
Ely M. Spears, 05/21/2013

"""
# Python imports and helper function definitions.
import numpy as np  
def format_helper(x):
    return str(np.round(x,2))
%>


<%text>
\documentclass[10pt]{article}
\usepackage[top=1in, bottom=1in, left=1in, right=1in]{geometry}
\usepackage{array}
\newcolumntype{L}[1]{>{\raggedright\let\newline\\\arraybackslash\hspace{0pt}}m{#1}}
\newcolumntype{C}[1]{>{\centering\let\newline\\\arraybackslash\hspace{0pt}}m{#1}}
\setlength{\parskip}{1em}
\setlength{\parindent}{0in}
\renewcommand*\arraystretch{1.5}
\author{Ely Spears}


\begin{document}
\begin{table} \caption{</%text>${table_title}<%text>}
\begin{center}
    \begin{tabular}{ | p{2.5cm}  c c c c c p{1cm} c c c c c c p{1cm} |}
    \hline
    & \multicolumn{6}{c}{CAPM $\beta$} & \multicolumn{6}{c}{CAPM $\alpha$ (\%p.a.)} & \\
    \cline{2-7} \cline{9-14}
    & \multicolumn{6}{c}{</%text>${group_var}<%text> </%text>${sort_var}<%text> is:} & \multicolumn{6}{c}{</%text>${group_var}<%text> </%text>${sort_var}<%text> is:} & \\
    Stock </%text>${sort_var}<%text> is: & Low & 2 & 3 & 4 & High & Low - High & & Low & 2 & 3 & 4 & High & Low - High \\ 
    \hline
    \multicolumn{4}{|l}{Panel A. Point estimates} & & & & & & & & & & \\ 
    \hline
    Low            & </%text>${' & '.join(df.ix[0].map(format_helper).values[0:6])}<%text> & & </%text>${' & '.join(df.ix[0].map(format_helper).values[6:])}<%text> \\
    2              & </%text>${' & '.join(df.ix[1].map(format_helper).values[0:6])}<%text> & & </%text>${' & '.join(df.ix[1].map(format_helper).values[6:])}<%text> \\
    3              & </%text>${' & '.join(df.ix[2].map(format_helper).values[0:6])}<%text> & & </%text>${' & '.join(df.ix[2].map(format_helper).values[6:])}<%text> \\
    4              & </%text>${' & '.join(df.ix[3].map(format_helper).values[0:6])}<%text> & & </%text>${' & '.join(df.ix[3].map(format_helper).values[6:])}<%text> \\
    High           & </%text>${' & '.join(df.ix[4].map(format_helper).values[0:6])}<%text> & & </%text>${' & '.join(df.ix[4].map(format_helper).values[6:])}<%text> \\
    Low - High     & </%text>${' & '.join(df.ix[5].map(format_helper).values[0:5])}<%text> & & & </%text>${' & '.join(df.ix[5].map(format_helper).values[6:11])}<%text> & \\


    \multicolumn{6}{|l}{</%text>${group_var}<%text> effect (average of Low - High \underline{column})}     
        & </%text>${format_helper(df.ix[6,5])}<%text> & & & & & & & </%text>${format_helper(df.ix[6,11])}<%text> \\


    \multicolumn{6}{|l}{Within-</%text>${group_var}<%text> effect (average of Low - High \underline{row})} 
        & </%text>${format_helper(df.ix[7,5])}<%text> & & & & & & & </%text>${format_helper(df.ix[7,11])}<%text> \\


    \multicolumn{13}{|l}{Total effect} & </%text>${format_helper(df.ix[8,11])}<%text>  \\
    \hline
    \multicolumn{4}{|l}{Panel B. t-statistics} & & & & & & & & & & \\
    \hline
    Low            & </%text>${' & '.join(df.ix[9].map(format_helper).values[0:6])}<%text>  & & </%text>${' & '.join(df.ix[9].map(format_helper).values[6:])}<%text> \\
    2              & </%text>${' & '.join(df.ix[10].map(format_helper).values[0:6])}<%text> & & </%text>${' & '.join(df.ix[10].map(format_helper).values[6:])}<%text> \\
    3              & </%text>${' & '.join(df.ix[11].map(format_helper).values[0:6])}<%text> & & </%text>${' & '.join(df.ix[11].map(format_helper).values[6:])}<%text> \\
    4              & </%text>${' & '.join(df.ix[12].map(format_helper).values[0:6])}<%text> & & </%text>${' & '.join(df.ix[12].map(format_helper).values[6:])}<%text> \\
    High           & </%text>${' & '.join(df.ix[13].map(format_helper).values[0:6])}<%text> & & </%text>${' & '.join(df.ix[13].map(format_helper).values[6:])}<%text> \\
    Low - High     & </%text>${' & '.join(df.ix[14].map(format_helper).values[0:5])}<%text> & & & </%text>${' & '.join(df.ix[14].map(format_helper).values[6:11])}<%text> & \\


    \multicolumn{6}{|l}{</%text>${group_var}<%text> effect (average of Low - High \underline{column})}     
        & </%text>${format_helper(df.ix[15,5])}<%text> & & & & & & & </%text>${format_helper(df.ix[15,11])}<%text> \\


    \multicolumn{6}{|l}{Within-</%text>${group_var}<%text> effect (average of Low - High \underline{row})} 
        & </%text>${format_helper(df.ix[16,5])}<%text> & & & & & & & </%text>${format_helper(df.ix[16,11])}<%text> \\
    \hline
    \end{tabular}
\end{center}
\end{table}
\end{document}
</%text>

My wrapper to_tex.py looks like this (with example usage in the if __name__ == "__main__" section):

"""
to_tex.py

Class for handling strings of TeX code and producing the
rendered PDF via PDF LaTeX. Assumes ability to call PDFLaTeX
via the operating system.
"""
class to_tex(object):
    """
    Publishes a TeX string to a PDF rendering with pdflatex.
    """
    def __init__(self, tex_string, tex_file, display=False):
        """
        Publish a string to a .tex file, which will be
        rendered into a .pdf file via pdflatex.
        """
        self.tex_string    = tex_string
        self.tex_file      = tex_file
        self.__to_tex_file()
        self.__to_pdf_file(display)
        print "Render status:", self.render_status

    def __to_tex_file(self):
        """
        Writes a tex string to a file.
        """
        with open(self.tex_file, 'w') as t_file:
            t_file.write(self.tex_string)

    def __to_pdf_file(self, display=False):
        """
        Compile a tex file to a pdf file with the
        same file path and name.
        """
        try:
            import os
            from subprocess import Popen
            proc = Popen(["pdflatex", "-output-directory", os.path.dirname(self.tex_file), self.tex_file])
            proc.communicate()
            self.render_status = "success"
        except Exception as e:
            self.render_status = str(e)

        # Launch a display of the pdf if requested.
        if (self.render_status == "success") and display:
            try:
                proc = Popen(["evince", self.tex_file.replace(".tex", ".pdf")])
                proc.communicate()
            except:
                pass

if __name__ == "__main__":
    from mako.template import Template
    template_file = "path/to/template.mako"
    t = Template(filename=template_file)
    tex_str = t.render(arg1="arg1", ...)
    tex_wrapper = to_tex(tex_str, )

My choice was to directly pump the TeX string to pdflatex and leave as an option to display it.

A small snippet of code actually using this with a DataFrame is here:

# Assume calculation work is done prior to this ...
all_beta  = pandas.concat([beta_df,  beta_tstat_df], axis=0)
all_alpha = pandas.concat([alpha_df, alpha_tstat_df], axis=0)
all_df = pandas.concat([all_beta, all_alpha], axis=1)

# Render result in TeX
tex_mako  = "/my_project/templates/mako/two_panel_double_sort_table.mako"
tex_file = "/my_project/some_tex_file_name.tex"

from mako.template import Template
t = Template(filename=tex_mako)
tex_str = t.render(all_df, table_title, group_var, tex_risk_name)

import my_project.to_tex as to_tex
tex_obj = to_tex.to_tex(tex_str, tex_file)
ely
  • 74,674
  • 34
  • 147
  • 228
  • 2
    That's nice, any chance you could make it a package (or maybe pr agains https://gist.github.com/takluyver/5098835) we should really have a Table Package for IPython that spit many format ! – Matt Dec 19 '13 at 16:51