11

I have to create a PDF file from a HTML source. Currently, I'm coping with problem concerning special (polish) characters in the output file, precisely with their lack.

HTML source:

<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
</head>
<body>
<table width="100%" border="0.5" align="center" cellpadding="0" style="border-collapse:collapse; border:1px solid black; font-family:Arial, Helvetica, sans-serif; font-size:16px">
  <tr>
    <td align="center" ><b>Test: ąęłóćńśŁÓŃĆŻŹąśżźłęó</b></td>
  </tr>
</table>

Java source:

Document document = new Document(PageSize.A4, 38, 38, 50, 38);  
PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream("iTextExample.pdf"));  
document.open();  
HTMLWorker htmlWorker = new HTMLWorker(document);  
htmlWorker.parse(new StringReader(readFileAsString("index.html")));  
document.close();


public static String readFileAsString(String filePath) throws IOException {
    DataInputStream dis = new DataInputStream(new FileInputStream(filePath));
    try {
        long len = new File(filePath).length();
        if (len > Integer.MAX_VALUE) {
            throw new IOException("File " + filePath + " too large, was " + len + " bytes.");
        }
        byte[] bytes = new byte[(int) len];
        dis.readFully(bytes);
        return new String(bytes, "UTF-8");
    } finally {
        dis.close();
    }
}

My question is: how to change default font (Helvetica) to eg. Arial Bold in whole PDF document?

I've tested many examples connected with StyleSheet and none of them worked. I have to change a default font, because there's no polish characters - that's the solution I hope is going to work.

Edit:

class defaultFontProvider extends FontFactoryImp {

    private String _default;

    public defaultFontProvider(String def) {
        _default = def;
    }

    public Font getFont(String fontName, String encoding, boolean embedded, float size, int style, BaseColor color, boolean cached) {
        if (fontName == null || size == 0) {
            fontName = _default;
        }

        return super.getFont(fontName, encoding, embedded, size, style, color, cached);
    }
}

The code above embeds arial.ttf which is OK, but how do I make it the default font (instead of Helvetica) for the whole document.

Then..

Map<String,Object> providers = new HashMap<String, Object>();

defaultFontProvider dfp = new defaultFontProvider("arial.ttf");

providers.put(HTMLWorker.FONT_PROVIDER, dfp);

HTMLWorker htmlWorker = new HTMLWorker(document);
htmlWorker.setProviders(providers);
Willi Mentzel
  • 27,862
  • 20
  • 113
  • 121
monczek
  • 1,142
  • 2
  • 13
  • 28
  • Can we do this when creating a `PDF/A-2` document using iText from a HTML and CSS files? – S_S Oct 10 '18 at 11:15
  • @Sumit Look at this example: https://developers.itextpdf.com/examples/archiving-and-accessibility-itext5/pdfa-2 – monczek Oct 10 '18 at 11:39
  • This example does not create from an HTML and CSS file, I am stuck when using XMLWorker, could you take a look at my question https://stackoverflow.com/q/52736441/3169868 – S_S Oct 10 '18 at 11:45

6 Answers6

3

Idea #1

One answer immediately springs to mind: Change iText. Specifically, Font.getCalculatedBaseFont, line 644.

String fontName = BaseFont.HELVETICA;

Actually, I don't think that will work unless you also change the way fonts are created... Line 712

cfont = BaseFont.createFont(fontName, encoding, false);

Unless a font is one of the "Base 14", you have to provide a path to the font's file rather than a simple font name.

Another option: XSLT

Transform the input such that you add a font definition to the style of any node that contains text.

Finally: register a fontProvider

You can sit on top of FontFactoryImp and simply map blank strings to your font of choice.

class DefaultFontProvider extends FontFactoryImp {
  private String default;
  public DefaultFontProvider(String def) {
    default = def;
  }

  // I believe this is the correct override, but there are quite a few others.
  public Font getFont(String fontname, String encoding, boolean embedded, float size, int style, BaseColor color, boolean cached) {
    if (fontName == null || fontName.size() == 0) {
      fontName = default;
    }
    return super.getFont(fontName, encoding, embedded, size, style, color, cached);
  }
}


Map<String,Object> providers = new HashMap<String, Object)();
providers.put(HTMLWorker.FONT_PROVIDER, new DefaultFontProvider("Arial Bold"));

myHTMLWorker.setProviders(providers);

This strikes me as the most Technically Sound idea. It's written for the freshly released 5.0.6 version of iText. Previous versions set the font provider via setInterfaceProps() instead. "Providers" is more of a name change than anything else at this point. I suspect that will no longer be the case in 5.1.

PS: FontFactoryImp has two public members you might be interested in as well: defaultEncoding and defaultEmbedding. You should be able to tweak the defaultEncoding to something more Polish-friendly. I recommend "Identity-H" (aka BaseFont.IDENTITY_H), but that does force all your fonts to be embedded subsets, thus ignoring defaultEmbedding, and making your files a bit larger than if the fonts weren't embedded at all.


Two possible problems:

  1. Explicitly requesting "Helvetica".

    To be sure, I suggest stuffing System.out.println(("Requested font: " + fontName); into the beginning of your getFont function. This will let you see all the font calls, and make sure you have all your fonts replaced correctly. If that's the case, you can just test for it and replace it with _default.

  2. Your fontFactory might not be finding anything for "Arial Bold" and so falls back to the default (Helvetica again).

    I think you need to call dfp.registerDirectories(). That'll ferret out all the fonts on several different OS's, and let you reference them all by font name rather than by path (which is what a FontFactoryImp is supposed to do in the first place).

Mark Storer
  • 15,672
  • 3
  • 42
  • 80
  • I was trying to get it works using Your Idea#1, but with no satisfying results. – monczek Feb 11 '11 at 14:32
  • I've converted Arial TTF to AFM, copy to resources, then change BaseFont.HELVETICA to make it points to Arial. Finally, I got PDF, but with error "The font Arial MT contains a bad /BBox." and #$#$% instead of polish characters. The only positive is Arial in output document :) – monczek Feb 11 '11 at 14:40
  • I found this: Ok. You have a bad font. Fonts are complicated structures, and your particular font doesn't have all of the required interal entries. "BBox" stand for Bounding Box, and is an array of numbers that describe the smallest box that would contain all of the characters in the font. A PostScript Type 1 font should, on a PC system, be contained in a file with a "pfb" or "pfa" extension. It will be accompanied by a "pfm" file, but that isn't strictly part of the font. – monczek Feb 11 '11 at 14:44
  • So, you aren't doing anything wrong. Your code works, and the PDF looks great. It's a bad/corrupt font. Speak to whoever supplied the font, or find a different font supplier. – monczek Feb 11 '11 at 14:44
  • Did you try registering a font provider? That looks like the Correct Way to do it. – Mark Storer Feb 11 '11 at 19:13
  • I've tried DefaultFontProvider and it looks promising, but still it doesn't work as expected. The output PDF has embedded desired font, but there's also still Helvetica and document doesn't use provided TTF font. Any ideas? I think we are close... :) – monczek Feb 14 '11 at 14:33
  • Can you add the code of your font provider? Or did I get it right? – Mark Storer Feb 14 '11 at 17:22
2

It will not chage the default font but only that tag where you will apply.

E.g. <Body>

Add style property in html tag

<tag style="font-family: Arial Unicode MS, FreeSans; font-size:16px; font-weight: normal; >Здраво दी फोंट डाउनलोड Ravi Parekh! </tag>

Note: System should found ARIALUNI in XMLWorkerFontProvider

XMLWorkerHelper.getInstance().parseXHtml(writer, document, new ByteArrayInputStream(html.getBytes(Charset.forName("UTF-8"))), null,Charset.forName("UTF-8"), new XMLWorkerFontProvider("/fonts/"));

sample:SamplePDF

Ravi Parekh
  • 5,253
  • 9
  • 46
  • 58
1

In your style tag, perhaps you can use CSS3 to change the font:

<style>
@font-face {
font-family: myFont;
src: url(Filename);
}
</style>

Not sure if this is what you were asking.

Brendan
  • 1,399
  • 1
  • 12
  • 18
0

If you want to use different fonts in PDF then include font package.

I had an issue that some turkish characters are not printed in PDF document so I included font package in PDF like this.

Willi Mentzel
  • 27,862
  • 20
  • 113
  • 121
Vinit Patel
  • 2,408
  • 5
  • 28
  • 53
0

You can add stylesheet to HTML parser. That should solve the problem of font and diacritics but you have to make a good choice when choosing the font.

           HttpContext.Current.Response.ContentType = "application / pdf";
            HttpContext.Current.Response.AddHeader("content-disposition", "attachment;filename=" + HttpUtility.UrlPathEncode(name));
            HttpContext.Current.Response.Cache.SetCacheability(HttpCacheability.NoCache);
            StringWriter sw = new StringWriter();
            HtmlTextWriter hw = new HtmlTextWriter(sw);
            Page.RenderControl(hw);
           
            StringReader sr = new StringReader(sw.ToString());
            Document pdfDoc = new Document(header.Length > 7 ? header.Length > 14 ? header.Length > 21 ? PageSize.A3.Rotate() : PageSize.A3 : PageSize.A4.Rotate() : PageSize.A4, 10f, 10f, 10f, 0f);
            HTMLWorker htmlparser = new HTMLWorker(pdfDoc);
            string sylfaenpath = Environment.GetEnvironmentVariable("SystemRoot") + "\\fonts\\sylfaen.ttf";
            FontFactory.Register(sylfaenpath, "sylfaen");

            htmlparser.SetStyleSheet(GenerateStyleSheet());

            PdfWriter.GetInstance(pdfDoc, HttpContext.Current.Response.OutputStream);
            
            pdfDoc.Open();
            htmlparser.Parse(sr);
            pdfDoc.Close();
            HttpContext.Current.Response.Write(pdfDoc);
            HttpContext.Current.ApplicationInstance.CompleteRequest();


    private static StyleSheet GenerateStyleSheet()
    {
        StyleSheet css = new StyleSheet();

        css.LoadTagStyle("body", "face", "sylfaen");
        css.LoadTagStyle("body", "encoding", "Identity-H");
        css.LoadTagStyle("body", "size", "13pt");
        
        css.LoadTagStyle("h1", "size", "30pt");
        css.LoadTagStyle("h1", "style", "line-height:30pt;font-weight:bold;");
        css.LoadTagStyle("h2", "size", "22pt");
        css.LoadTagStyle("h2", "style", "line-height:30pt;font-weight:bold;margin-top:5pt;margin-bottom:12pt;");
        css.LoadTagStyle("h3", "size", "15pt");
        css.LoadTagStyle("h3", "style", "line-height:25pt;font-weight:bold;margin-top:1pt;margin-bottom:15pt;");
        css.LoadTagStyle("h4", "size", "13pt");
        css.LoadTagStyle("h4", "style", "line-height:23pt;margin-top:1pt;margin-bottom:15pt;");
        css.LoadTagStyle("hr", "width", "100%");
        css.LoadTagStyle("a", "style", "text-decoration:underline;");
        return css;
    }
laica211
  • 1
  • 1
-1

protected void pdfButton_Click(object sender, EventArgs e) { List showCourses = new List(); CourseManager aCourseManager = new CourseManager(); int departmentId = Convert.ToInt16(departmentDropDownList.Text); int semesterId = Convert.ToInt16(semesterDropDownList.Text); showCourses = aCourseManager.GetScheduleCoursesByDepartmentIdAndSemester(departmentId, semesterId);

        Document doc = new Document(iTextSharp.text.PageSize.LETTER, 10, 10, 42, 35);
        string pdfFilePath = Server.MapPath("CoursePdf.pdf");
        PdfWriter wri = PdfWriter.GetInstance(doc, new FileStream(pdfFilePath, FileMode.Create));
        doc.Open(); //Open Document to write
        iTextSharp.text.Font font8 = FontFactory.GetFont("ARIAL", 7);
        string heading = " \t\t                                          Course Schedule Details for Department: " +
                                departmentDropDownList.SelectedItem;
        Paragraph reportHeading = new Paragraph(heading);
        if (showCourses != null)
        {
            PdfPTable PdfTable = new PdfPTable(6);
            PdfPCell PdfPCell = null;
            PdfPCell = new PdfPCell(new Phrase(new Chunk("Course Code", font8)));
            PdfTable.AddCell(PdfPCell);
            PdfPCell = new PdfPCell(new Phrase(new Chunk("Course Name", font8)));
            PdfTable.AddCell(PdfPCell);
            PdfPCell = new PdfPCell(new Phrase(new Chunk("Semester Name", font8)));
            PdfTable.AddCell(PdfPCell);
            PdfPCell = new PdfPCell(new Phrase(new Chunk("Course Credit", font8)));
            PdfTable.AddCell(PdfPCell);
            PdfPCell = new PdfPCell(new Phrase(new Chunk("Assign To", font8)));
            PdfTable.AddCell(PdfPCell);
            PdfPCell = new PdfPCell(new Phrase(new Chunk("Schedule", font8)));
            PdfTable.AddCell(PdfPCell);

            foreach (ShowCourse aCourse in showCourses)
            {
                PdfPCell = new PdfPCell(new Phrase(new Chunk(aCourse.courseCode, font8)));
                PdfTable.AddCell(PdfPCell);
                PdfPCell = new PdfPCell(new Phrase(new Chunk(aCourse.courseName, font8)));
                PdfTable.AddCell(PdfPCell);
                PdfPCell = new PdfPCell(new Phrase(new Chunk(aCourse.semesterName, font8)));
                PdfTable.AddCell(PdfPCell);
                PdfPCell = new PdfPCell(new Phrase(new Chunk((aCourse.credit).ToString(), font8)));
                PdfTable.AddCell(PdfPCell);
                PdfPCell = new PdfPCell(new Phrase(new Chunk(aCourse.teacherName , font8)));
                PdfTable.AddCell(PdfPCell);
                PdfPCell = new PdfPCell(new Phrase(new Chunk(aCourse.schedule, font8)));
                PdfTable.AddCell(PdfPCell);
            }
            PdfTable.SpacingBefore = 15f; // Give some space after the text or it m
            doc.Add(reportHeading); // add paragraph to the document
            doc.Add(PdfTable); // add pdf table to the document
            doc.Close();
            // string pdfPath = Server.MapPath("~/SomePDFFile.pdf");
            WebClient client = new WebClient();
            Byte[] buffer = client.DownloadData(pdfFilePath);
            Response.ContentType = "application/pdf";
            Response.AddHeader("content-length", buffer.Length.ToString());
            Response.BinaryWrite(buffer);
        }

    }
sandipon
  • 986
  • 1
  • 6
  • 19