2

I'm stuck at a point where I have to write multiple pandas dataframe's to a PDF file.The function accepts dataframe as input.

However, I'm able to write to PDF for the first time but all the subsequent calls are overriding the existing data, leaving with only one dataframe in the PDF by the end.

Please find the python function below :

def fn_print_pdf(df):
 pp = PdfPages('Sample.pdf')
 total_rows, total_cols = df.shape;

 rows_per_page = 30; # Number of rows per page
 rows_printed = 0
 page_number = 1;
 while (total_rows >0):
    fig=plt.figure(figsize=(8.5, 11))
    plt.gca().axis('off')
    matplotlib_tab = pd.tools.plotting.table(plt.gca(),df.iloc[rows_printed:rows_printed+rows_per_page],
        loc='upper center', colWidths=[0.15]*total_cols)
    #Tabular styling
    table_props=matplotlib_tab.properties()
    table_cells=table_props['child_artists']
    for cell in table_cells:
        cell.set_height(0.024)
        cell.set_fontsize(12)
    # Header,Footer and Page Number
    fig.text(4.25/8.5, 10.5/11., "Sample", ha='center', fontsize=12)
    fig.text(4.25/8.5, 0.5/11., 'P'+str(page_number), ha='center', fontsize=12)
    pp.savefig()
    plt.close()
    #Update variables
    rows_printed += rows_per_page;
    total_rows -= rows_per_page;
    page_number+=1;
 pp.close()

And I'm calling this function as ::

raw_data = {
        'subject_id': ['1', '2', '3', '4', '5'],
        'first_name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
        'last_name': ['Anderson', 'Ackerman', 'Ali', 'Aoni', 'Atiches']}
df_a = pd.DataFrame(raw_data, columns=['subject_id', 'first_name', 'last_name'])
fn_print_pdf(df_a)

raw_data = {
    'subject_id': ['4', '5', '6', '7', '8'],
    'first_name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],
    'last_name': ['Bonder', 'Black', 'Balwner', 'Brice', 'Btisan']}
df_b = pd.DataFrame(raw_data, columns=['subject_id', 'first_name', 'last_name'])
fn_print_pdf(df_b)   

PDF file is available at SamplePDF .As you can see only the data from second dataframe is saved ultimately.Is there a way to append to the same Sample.pdf in the second pass and so on while still preserving the former data?

Serenity
  • 35,289
  • 20
  • 120
  • 115
Sri
  • 105
  • 1
  • 9
  • If you need all data in one table, send a concatenated df in one call: `fn_print_pdf(pd.concat([df_a, df_b]))` – Parfait Jul 31 '16 at 19:13

1 Answers1

3

Your PDF's are being overwritten, because you're creating a new PDF document every time you call fn_print_pdf(). You can try keep your PdfPages instance open between function calls, and make a call to pp.close() only after all your plots are written. For reference see this answer.

Another option is to write the PDF's to a different file, and use pyPDF to merge them, see this answer.

Edit : Here is some working code for the first approach.

Your function is modified to :

def fn_print_pdf(df,pp): 
 total_rows, total_cols = df.shape;

 rows_per_page = 30; # Number of rows per page
 rows_printed = 0
 page_number = 1;
 while (total_rows >0):
    fig=plt.figure(figsize=(8.5, 11))
    plt.gca().axis('off')
    matplotlib_tab = pd.tools.plotting.table(plt.gca(),df.iloc[rows_printed:rows_printed+rows_per_page],
        loc='upper center', colWidths=[0.15]*total_cols)
    #Tabular styling
    table_props=matplotlib_tab.properties()
    table_cells=table_props['child_artists']
    for cell in table_cells:
        cell.set_height(0.024)
        cell.set_fontsize(12)
    # Header,Footer and Page Number
    fig.text(4.25/8.5, 10.5/11., "Sample", ha='center', fontsize=12)
    fig.text(4.25/8.5, 0.5/11., 'P'+str(page_number), ha='center', fontsize=12)
    pp.savefig()
    plt.close()
    #Update variables
    rows_printed += rows_per_page;
    total_rows -= rows_per_page;
    page_number+=1;

Call your function with:

pp = PdfPages('Sample.pdf')
fn_print_pdf(df_a,pp)
fn_print_pdf(df_b,pp)   
pp.close()
Community
  • 1
  • 1
user666
  • 5,231
  • 2
  • 26
  • 35
  • Great ..it worked..use of pp across the session worked..Many thanks for your help :) – Sri Aug 01 '16 at 04:18