I'm stuck at a point where I have to write multiple pandas dataframe's to a PDF file.The function accepts dataframe as input.
However, I'm able to write to PDF for the first time but all the subsequent calls are overriding the existing data, leaving with only one dataframe in the PDF by the end.
Please find the python function below :
def fn_print_pdf(df):
pp = PdfPages('Sample.pdf')
total_rows, total_cols = df.shape;
rows_per_page = 30; # Number of rows per page
rows_printed = 0
page_number = 1;
while (total_rows >0):
fig=plt.figure(figsize=(8.5, 11))
plt.gca().axis('off')
matplotlib_tab = pd.tools.plotting.table(plt.gca(),df.iloc[rows_printed:rows_printed+rows_per_page],
loc='upper center', colWidths=[0.15]*total_cols)
#Tabular styling
table_props=matplotlib_tab.properties()
table_cells=table_props['child_artists']
for cell in table_cells:
cell.set_height(0.024)
cell.set_fontsize(12)
# Header,Footer and Page Number
fig.text(4.25/8.5, 10.5/11., "Sample", ha='center', fontsize=12)
fig.text(4.25/8.5, 0.5/11., 'P'+str(page_number), ha='center', fontsize=12)
pp.savefig()
plt.close()
#Update variables
rows_printed += rows_per_page;
total_rows -= rows_per_page;
page_number+=1;
pp.close()
And I'm calling this function as ::
raw_data = {
'subject_id': ['1', '2', '3', '4', '5'],
'first_name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
'last_name': ['Anderson', 'Ackerman', 'Ali', 'Aoni', 'Atiches']}
df_a = pd.DataFrame(raw_data, columns=['subject_id', 'first_name', 'last_name'])
fn_print_pdf(df_a)
raw_data = {
'subject_id': ['4', '5', '6', '7', '8'],
'first_name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],
'last_name': ['Bonder', 'Black', 'Balwner', 'Brice', 'Btisan']}
df_b = pd.DataFrame(raw_data, columns=['subject_id', 'first_name', 'last_name'])
fn_print_pdf(df_b)
PDF file is available at SamplePDF .As you can see only the data from second dataframe is saved ultimately.Is there a way to append to the same Sample.pdf in the second pass and so on while still preserving the former data?