0

I would like to create a single pdf with multiple pages where each page contains a table. I have a large dataframe and I am splitting into multiple sub dataframes and I am trying to have one page each for the each sub dataframes in the pdf.

from matplotlib.backends.backend_pdf import PdfPages
import matplotlib.pyplot as plt
df = pd.DataFrame(np.random.randint(0,100,size=(150, 4)), columns=list('ABCD'))
df_list = np.array_split(df, 3)
with PdfPages('multipage_output_pdf.pdf') as pdf:
    for table in df_list:
        fig = plt.figure(figsize=(11.69,8.27))
        ax = fig.add_subplot(111)
        for row in range(len(table)):
                cell_text.append(table.iloc[row])
        ax.table(cellText=cell_text, colLabels=table.columns, loc='center')
        ax.axis('off')
        pdf.savefig(fig)
pdf.close()

I tried the above code but I am getting only one page (only one sub dataframe) in the output pdf. How should I display all the dataframes in the pdf ?

keerthu p
  • 25
  • 1
  • 5

1 Answers1

0

The problem is that cell_text is not reset to an empty list after each loop, so each successive table will also include the previous one(s). Anyway, cell_text is not actually needed as the cell values can be accessed with table.values.

In the following example, the figure dimensions are switched around to have a portrait orientation of the A4 pages to fit the tables on single pages. Also, the column to improve the format of the table. The pyplot interface is used exclusively so as to simplify the code a bit.

import numpy as np               # v 1.19.2
import pandas as pd              # v 1.2.3
import matplotlib.pyplot as plt  # v 3.3.4
from matplotlib.backends.backend_pdf import PdfPages

df = pd.DataFrame(np.random.randint(0, 100, size=(150, 4)), columns=list('ABCD'))
df_list = np.array_split(df, 3)
with PdfPages('multipage_output_pdf.pdf') as pdf:
    for table in df_list:
        plt.figure(figsize=(8.27, 11.69))
        plt.table(cellText=table.values, colLabels=table.columns, loc='center',
                  colWidths=[0.1 for col in range(df.columns.size)])
        plt.axis('off')
        pdf.savefig()
        plt.close()

Reference: answer by user3226167

Patrick FitzGerald
  • 3,280
  • 2
  • 18
  • 30