3

I have a dataframe

              a            b   c
0   2610.101010 13151.030303   33.000000
1   1119.459459 5624.216216    65.777778
2   3584.000000 18005.333333    3.000000
3   1227.272727 5303.272727    29.333333
4   1661.156504 8558.836558   499.666667

and I am plotting histograms using plotly.express and I am also printing a describe table with the following simple code:

import plotly.express as px
for col in df.columns:
    px.histogram(df, x=col, title=col).show()
    print(df[col].describe().T)

Is it possible to add next to each histogram the describe and save all the plots (together with their respective histograms) in a single pdf ?

quant
  • 4,062
  • 5
  • 29
  • 70
  • You can read this answer https://stackoverflow.com/a/27327984/14280520 to save multiple images in a single pdf. – lauriane.g Oct 22 '20 at 12:33
  • You can also use reportlab to do it. Look at this tutorial http://www.blog.pythonlibrary.org/2010/03/08/a-simple-step-by-step-reportlab-tutorial/ and the userguide https://www.reportlab.com/docs/reportlab-userguide.pdf – lauriane.g Oct 22 '20 at 12:38
  • None of these solutions worked for me, unfortunately – quant Oct 22 '20 at 12:43

1 Answers1

4

One way to achieve this is by creating a subplot grid, the size of n_columns * 2 (one for the histogram and one for the table. For example:

from plotly.subplots import make_subplots

titles = [[f"Histogram of {col}", f"Stats of {col}"] for col in df.columns]
titles = [item for sublist in titles for item in sublist]

fig = make_subplots(rows=3, 
                    cols=2, 
                    specs=[[{"type": "histogram"}, {"type": "table"}]] *3,
                    subplot_titles=titles)

for i, col in enumerate(df.columns):
    fig.add_histogram(x=df[col], 
                      row=i+1, 
                      col=1)
    fig.add_table(cells=dict(
                        values=df[col].describe().reset_index().T.values.tolist()
                        ), 
                  header=dict(values=['Statistic', 'Value']), 
                  row=i+1, 
                  col=2
                 )
fig.update_layout(showlegend=False) 
fig.show()

fig.write_image("example_output.pdf")

In the end, you can save the full fig (6 charts together) as pdf using .write_image() as explained here. You will need to install kaleido or orca utilities to do so. The output will look like this (you can of course customize it):

enter image description here

If you need to save each graph + table on a separate page of the PDF, you can take advantage of the PyPDF2 library. So, first, you would save each graph + table as a single PDF (as described above, but you would save as many PDF files as numbers of columns you have, not 1), and then you could follow the instructions from this answer to merge them:

tania
  • 2,104
  • 10
  • 18
  • I'd like every histogram with its corresponding table to be on a separate page of the pdf, because this solution would have issues if you want to plot e.g. 10 columns, right ? – quant Oct 22 '20 at 14:27
  • @quant Doesn't that defy this: `Is it possible to add next to each histogram the describe and save all the plots (together with their respective histograms) in a single pdf`? If it doesn't, sorry for bothering you, If it does, then please upvote and accept this answer, and ask a new question that takes care of the mentioned detail. – vestland Oct 22 '20 at 15:25
  • 2
    @quant `write_image` from plotly doesn't give you much flexibility on that front. As you said, the figure would be saved as a single PDF page. I added a paragraph in the end explaining how you could do that. Essentially, instead of creating 1 single `fig`, you could create + save a `fig` as pdf for each graph + table. Then you could use the `PyPDF2` package to concatenate all the files into one single PDF file, where each row is one single page. – tania Oct 22 '20 at 15:38