As I browsed over various forums and stackoverflow questions, I found no answer with a complex solution to the special characters problem. I tried to provide one in exchange of quite a long reply to the question. Hopefully this will help someone out...
I used the XMLWorker
from SourceForge as HtmlWorker
became depricated. The problem with special characters remained thought. I found two solutions that actually work and can be used both separately and combined.
HTML & CSS solution
Each tag involved need to have font-family style specified in order to be interpreted correctly by ParseXHtml
method (I am not sure why nested tag styles inheritance does not work here but it seems it really doesn't or it doesn't fully).
This solution allows to modify resulting PDF based on HTML code only, thus some scenarios without code recompilation might take place.
Simplified code (for an MVC app) would be like that:
Controller:
public FileStreamResult GetPdf()
{
const string CONTENT_TYPE = "application/pdf"
var fileName = "mySimple.pdf";
var html = GetViewPageHtmlCode();
//the way how to capture view HTML are described in other threads, e.g. [here][2]
var css = Server.MapPath("~/Content/Pdf.css");
using (var capturedActionStream = new MemoryStream(USED_ENCODING.GetBytes(html)))
{
using (var cssFile = new FileStream(css), FileMode.Open))
{
var memoryStream = new MemoryStream();
//to create landscape, use PageSize.A4.Rotate() for pageSize
var document = new Document(PageSize.A4, 30, 30, 10, 10);
var writer = PdfWriter.GetInstance(document, memoryStream);
var worker = XMLWorkerHelper.GetInstance();
document.Open();
worker.ParseXHtml(writer, document, capturedActionStream, cssFile);
writer.CloseStream = false;
document.Close();
memoryStream.Position = 0;
//to enforce file download
HttpContext.Response.AddHeader(
"Content-Disposition",
String.Format("attachment; filename={0}", fileName));
var wrappedPdf = new FileStreamResult(memoryStream, CONTENT_TYPE);
return wrappedPdf;
}
}
}
CSS:
body {
background-color: white;
font-size: .85em;
font-family: Arial;
margin: 0;
padding: 0;
color: black;
}
p, ul {
margin-bottom: 20px;
line-height: 1.6em;
}
div, span {
font-family: Arial;
}
h1, h2, h3, h4, h5, h6 {
font-size: 1.5em;
color: #000;
font-family: Arial;
}
View layout
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8"/>
<title>@ViewBag.Title</title>
<link href="@Url.Content("~/Content/Pdf.css")" rel="stylesheet" type="text/css" />
</head>
<body>
<div class="page">
<div id="main">
@RenderBody()
</div>
</div>
</body>
</html>
View page
@{
ViewBag.Title = "PDF page title"
}
<h1>@ViewBag.Title</h1>
<p>
ěščřžýáíéů ĚŠČŘŽÝÁÍÉŮ
</p>
Inside-code font-replacing solution
In this solution, the font returned by an IFontProvider
is modified to the one contains (correct) representation of special characters and BaseFont.IDENTITY_H encoding is used. Advantage of the approach is, that there is exactly one font that is used. This is also disadvantage of the sort.
Also, this solutions expects the font is part of the project (*.ttf file(s) placed in Content/Fonts
folder).
Alternatively the fonts can be retrieved from Windows fonts location: Environment.GetFolderPath(Environment.SpecialFolder.Fonts)
- this requires knowledge (or strong belief) of fonts installed on the server or control over the server
FontProvider
(over FontFactory
)
I took my liberty to extend Gregor S's solution a bit, that provides more complex FontFactory that can be used for variety of HTML "templates" pushed through XMLWorker.
public class CustomFontFactory : FontFactoryImp
{
public const Single DEFAULT_FONT_SIZE = 12;
public const Int32 DEFAULT_FONT_STYLE = 0;
public static readonly BaseColor DEFAULT_FONT_COLOR = BaseColor.BLACK;
public String DefaultFontPath { get; private set; }
public String DefaultFontEncoding { get; private set; }
public Boolean DefaultFontEmbedding { get; private set; }
public Single DefaultFontSize { get; private set; }
public Int32 DefaultFontStyle { get; private set; }
public BaseColor DefaultFontColor { get; private set; }
public Boolean ReplaceEncodingWithDefault { get; set; }
public Boolean ReplaceEmbeddingWithDefault { get; set; }
public Boolean ReplaceFontWithDefault { get; set; }
public Boolean ReplaceSizeWithDefault { get; set; }
public Boolean ReplaceStyleWithDefault { get; set; }
public Boolean ReplaceColorWithDefault { get; set; }
public BaseFont DefaultBaseFont { get; protected set; }
public CustomFontFactory(
String defaultFontFilePath,
String defaultFontEncoding = BaseFont.IDENTITY_H,
Boolean defaultFontEmbedding = BaseFont.EMBEDDED,
Single? defaultFontSize = null,
Int32? defaultFontStyle = null,
BaseColor defaultFontColor = null,
Boolean automaticalySetReplacementForNullables = true)
{
//set default font properties
DefaultFontPath = defaultFontFilePath;
DefaultFontEncoding = defaultFontEncoding;
DefaultFontEmbedding = defaultFontEmbedding;
DefaultFontColor = defaultFontColor == null
? DEFAULT_FONT_COLOR
: defaultFontColor;
DefaultFontSize = defaultFontSize.HasValue
? defaultFontSize.Value
: DEFAULT_FONT_SIZE;
DefaultFontStyle = defaultFontStyle.HasValue
? defaultFontStyle.Value
: DEFAULT_FONT_STYLE;
//set default replacement options
ReplaceFontWithDefault = false;
ReplaceEncodingWithDefault = true;
ReplaceEmbeddingWithDefault = false;
if (automaticalySetReplacementForNullables)
{
ReplaceSizeWithDefault = defaultFontSize.HasValue;
ReplaceStyleWithDefault = defaultFontStyle.HasValue;
ReplaceColorWithDefault = defaultFontColor != null;
}
//define default font
DefaultBaseFont = BaseFont.CreateFont(DefaultFontPath, DefaultFontEncoding, DefaultFontEmbedding);
//register system fonts
FontFactory.RegisterDirectories();
}
protected Font GetBaseFont(Single size, Int32 style, BaseColor color)
{
var baseFont = new Font(DefaultBaseFont, size, style, color);
return baseFont;
}
public override Font GetFont(String fontname, String encoding, Boolean embedded, Single size, Int32 style, BaseColor color, Boolean cached)
{
//eventually replace expected font properties
size = ReplaceSizeWithDefault
? DefaultFontSize
: size;
style = ReplaceStyleWithDefault
? DefaultFontStyle
: style;
encoding = ReplaceEncodingWithDefault
? DefaultFontEncoding
: encoding;
embedded = ReplaceEmbeddingWithDefault
? DefaultFontEmbedding
: embedded;
//get font
Font font = null;
if (ReplaceFontWithDefault)
{
font = GetBaseFont(
size,
style,
color);
}
else
{
font = FontFactory.GetFont(
fontname,
encoding,
embedded,
size,
style,
color,
cached);
if (font.BaseFont == null)
font = GetBaseFont(
size,
style,
color);
}
return font;
}
}
Controller
private const String DEFAULT_FONT_LOCATION = "~/Content/Fonts";
private const String DEFAULT_FONT_NAME = "arialn.ttf";
public FileStreamResult GetPdf()
{
const string CONTENT_TYPE = "application/pdf"
var fileName = "mySimple.pdf";
var html = GetViewPageHtmlCode();
//the way how to capture view HTML are described in other threads, e.g.
var css = Server.MapPath("~/Content/Pdf.css");
using (var capturedActionStream = new MemoryStream(USED_ENCODING.GetBytes(html)))
{
using (var cssFile = new FileStream(css), FileMode.Open))
{
var memoryStream = new MemoryStream();
var document = new Document(PageSize.A4, 30, 30, 10, 10);
//to create landscape, use PageSize.A4.Rotate() for pageSize
var writer = PdfWriter.GetInstance(document, memoryStream);
var worker = XMLWorkerHelper.GetInstance();
var defaultFontPath = Server
.MapPath(Path
.Combine(
DEFAULT_FONT_LOCATION,
DEFAULT_FONT_NAME));
var fontProvider = new CustomFontFactory(defaultFontPath);
document.Open();
worker.ParseXHtml(writer, document, capturedActionStream, cssFile, fontProvider);
writer.CloseStream = false;
document.Close();
memoryStream.Position = 0;
//to enforce file download
HttpContext.Response.AddHeader(
"Content-Disposition",
String.Format("attachment; filename={0}", fileName));
var wrappedPdf = new FileStreamResult(memoryStream, CONTENT_TYPE);
return wrappedPdf;
}
}
}
CSS:
body {
background-color: white;
font-size: .85em;
font-family: "Trebuchet MS", Verdana, Helvetica, Sans-Serif;
margin: 0;
padding: 0;
color: black;
}
p, ul {
margin-bottom: 20px;
line-height: 1.6em;
}
h1, h2, h3, h4, h5, h6 {
font-size: 1.5em;
color: #000;
}
View layout
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8"/>
<title>@ViewBag.Title</title>
<link href="@Url.Content("~/Content/Pdf.css")" rel="stylesheet" type="text/css" />
</head>
<body>
<div class="page">
<div id="main">
@RenderBody()
</div>
</div>
</body>
</html>
View page
@{
ViewBag.Title = "PDF page title"
}
<h1>@ViewBag.Title</h1>
<p>
ěščřžýáíéů ĚŠČŘŽÝÁÍÉŮ
</p>
Other useful (re)sources: