How to print dataframe row by row into pdf and to align it in page?

Question

I want to print the dataframe into a pdf, in a table like structure. Also, I have other data that I want to print on the same page. I tried to print the dataframe row by row and this is what I tried:

from fpdf import FPDF
import pandas as pd

pdf = FPDF(format='letter', unit='in')

pdf.add_page()

pdf.set_font('helvetica', 'BU', 8)

pdf.ln(0.25)
data = [
    [1, 'denumire1', 'cant1', 'pret1', 'valoare1'],
    [2, 'denumire2', 'cant2', 'pret2', 'valoare2'],
    [3, 'denumire3', 'cant3', 'pret3', 'valoare3'],
    [4, 'denumire4', 'cant4', 'pret4', 'valoare4'],
]


df = pd.DataFrame(data, columns=['Nr. crt.', 'Denumire', 'Cant.', 'Pret unitar', 'Valoarea'])


for index, row in df.iterrows():
    pdf.cell(7, 0.5,str(row['Nr. crt.'])+str(row['Denumire'])+ str(row['Cant.'])+ str(row['Pret unitar'])+ str(row['Valoarea']))

pdf.output('test.pdf', 'F')

However, the format is not readable.

How could I print the dataframe to the pdf using FPDF,so that it aligns in the page?

This is how the dataframe looks now, using the given code:

What is the `fpdf` library that you are using here? Neither in original PyFPDF (fpdf from PyPI) nor in fpdf2 the signature for `cell` is consistent with your code... — Serge Ballesta, Oct 07 '22 at 12:50
@SergeBallesta sorry, I copied by mistake the outdated version of the code. Updated it now. This is the library: https://pypi.org/project/fpdf/ — kitten_world, Oct 07 '22 at 12:58
I no longer have any error with that new code... A test.pdf file is even correctly created (even if its content is probably not what you want...) — Serge Ballesta, Oct 07 '22 at 13:21
Is it possible to align the table so that the content can be readable even if the dataframe content changes? Also now the content is not fit to the page and cannot be fully readable... — kitten_world, Oct 07 '22 at 13:24
It is now a quite different problem. You should delete this question and ask a new one explaining what you get and what you want. Or as there is no answer here, you could also rewrite this question... — Serge Ballesta, Oct 07 '22 at 14:23
And if all you want is just to format your dataframe into a pdf file, you should look at this [other SO question](https://stackoverflow.com/q/33155776/3545273). The answers propose some possible ways, probably simpler that directly using fpdf which is a rather low level package. — Serge Ballesta, Oct 07 '22 at 14:30
@SergeBallesta thanks, but none of it is using the FPDF library. I need to add more text to the pdf and the text is already formatted(prepared) using FPDF library. — kitten_world, Oct 07 '22 at 14:34
Try looking at [How to write structured and unstructured data to PDF using Python](https://dock2learn.com/tech/how-to-write-structured-and-unstructured-data-to-pdf-using-python/). — Alias Cartellano, Oct 07 '22 at 18:53

score 2 · Answer 1 · answered Oct 08 '22 at 08:37

2

The fpdf module is a rather low level library. You have to explicitely write each cell after computing the cell width. Here you use a letter size (8 x 11.5 in.), and have 5 columns so a 1.6 width seems legitimate. Code could be:

...
for index, row in df.iterrows():
    for data in row.values:
        pdf.cell(1.6, 0.5, str(data))  # write each data for the row in its cell
    pdf.ln()                           # go to next line after each row

answered Oct 08 '22 at 08:37

Serge Ballesta

143,923
11
122
252

1

That works fine, but if a cell contains a long text, it would overlap the other cells. Do you know how to fix this? – kitten_world Oct 10 '22 at 07:59

score 0 · Answer 2 · answered Feb 16 '23 at 21:06

IMPORTANT: for my solution, we need to iterate through the DataFrame. And I know this is not ideal since it's very time consuming for larger size DataFrames. But since you are printing the results in a table I'm assuming it's a small sample. But consider using more efficient methods.

First, let's import the needed modules and create de DataFrame:

import pandas as pd
import math
from fpdf import FPDF

data = [
    [1, 'denumire1', 'cant1', 'pret1', 'valoare1'],
    [2, 'denumire2', 'cant2', 'pret2', 'valoare2'],
    [3, 'denumire3', 'cant3', 'pret3', 'valoare3'],
    [4, 'denumire4', 'cant4', 'pret4', 'valoare4'],
    ]

df = pd.DataFrame(data, columns=['Nr. crt.', 'Denumire', 'Cant.', 'Pretunitar',
     'Valoarea'])

Now we can create our document, add a page and set margins and font

# Creating document
pdf = FPDF("P", "mm", "A4")
pdf.set_margins(left= 10, top= 10)
pdf.set_font("Helvetica", style= "B", size= 14)
pdf.set_text_color(r= 0, g= 0, b= 0)
pdf.add_page()

Now we can create the first element of our table: the header. I'm assuming we will print on the table only the given columns so I'll use their names as headers. Since we have 5 columns with multiple characters, we must take in consideration the fact that we might need more than one line for the header, in case a cell has too many characters for a single line.

To solve that, line height must be equal to the font size times the number of lines needed (eg.: if you have a str with width of 150 and the cell has width of 100, you will need 2 lines (1.5 rounded up)). But we need to do this to every column name and use the higher value as our number of lines.

Also, I'm assuming you will equally divide the whole width of the page minus margins for the 5 columns (cells).

# Creating our table headers
cell_width = (210 -10 -10) / len(df.columns)
line_height = pdf.font_size
number_lines = 1
for i in df.columns:
    new_number_lines = math.ceil(pdf.get_string_width(str(i)) / cell_width)
    if new_number_lines > number_lines:
        number_lines = new_number_lines

Now, with our line height for the header, we can iterate through the columns names and print each one. I'll use style "B" and size 14 for the headers (defined earlier).

for i in df.columns:
pdf.multi_cell(w= cell_width, h= line_height * number_lines * 1.5,
               txt=str(i), align="C", border="B", new_x="RIGHT", new_y="TOP",
               max_line_height= line_height)
pdf.ln(line_height * 1.5 * number_lines)

After that we must iterate through all the dataframe and for each iteration we must create cells with the content. Also, for each iteration we have to account for differences in text size and, therefore, number of lines. But by now you probably figured out that the process is the same as before: we iterate through the line to calculate the number of lines needed and then use that value to define cells with the content.

Before printing the body of the table, I'm removing the bold style.

# Changing font style
pdf.set_font("Helvetica", style= "", size= 14)

# Creating our table row by row
for index, row in df.iterrows():
    number_lines = 1
    for i in range(len(df.columns)):
        new_number_lines = math.ceil(pdf.get_string_width(str(row[i])) / cell_width)
        if new_number_lines > number_lines:
            number_lines = new_number_lines

    for i in range(len(df.columns)):
        pdf.multi_cell(w=cell_width, h=line_height * number_lines * 1.5,
                   txt=str(row[i]), align="C", border="B", new_x="RIGHT", new_y="TOP", max_line_height= line_height)
    pdf.ln(line_height * 1.5 * number_lines)

pdf.output("table.pdf")

How to print dataframe row by row into pdf and to align it in page?

2 Answers2