5

Trying to use the jspdf lib @1.4.1 to convert text to pdf, the output sometimes gets so ugly and unreadable, because the text contains some special characters, like:

the left single quotation mark U+2018, or the right one U+2019, or symbols like , or the ı in Kadıköy... how can i sanitize/normalize such texts? or is there any option is jspdf that i can use to fix this problem?

update:

to reproduce the problem, just use this string: '→Kadıköy' in this example https://parall.ax/products/jspdf , line 9, you will see that the arrow is converted to !’ and the ı is converted to 1

(FYI, Kadıköy is name of a city https://en.wikipedia.org/wiki/Kad%C4%B1k%C3%B6y)

Community
  • 1
  • 1
Bonnard
  • 389
  • 2
  • 8
  • 26

3 Answers3

5

We can read here:

jsPDF supports finally UTF-8 by having the ability to use custom fonts.

The problem which you have is that you do not really realize how the PDF works. It must have some font which can display correct letters. It must be one system font (for PDF reader) or embeded font. And for each one single letter the PDF have to have one correct font. In this case for each word in new language in the same PDF you have to set the correct font.

Some TTF fonts was created for some specific letters, but not all TTFs was correctly created because behind this is one standard technology. Also not all of TTF fonts which was created for some specific letters can display them in PDF. For example font "Devanagari" which I have found in internet schould support all hindi letters, but it has failed fully.

Also we have to find the correct TTF fonts. And I found them - in your case for the string "‘→Kadıköy’" you could use "Courier New" or "Arial Unicode MS".

I have searched for each letter from your task and have found following lists:

→ – Font support for "Rightwards arrow" (u+2192)

ı – Font support for "Latin small letter dotless I" (u+0131)

‘ – Font support for "Left single quotation mark" (u+2018)

’ – Font support for "Right single quotation mark" (u+2019)

ö – Font support for "Latin small letter o with diaeresis'" (u+00F6)

Solution for most languages of the world

I have created the application which can create PDFs for most of languages in the world.

How to use it:

  1. At first download and extract free TTF font "Arial Unicode MS"
  2. Start the snippet below and choose the extracted free TTF font "Arial Unicode MS" from your folder.
  3. Write the text in your language and click on "Create PDF" button.
  4. The PDF will be downloaded and you could open it.

In some cases it could be that your language is not supported in TTF font "Arial Unicode MS". The full list of supported languages you can find here. In this case you have to find one from the correct TTF font. But be careful: if the font is under 100 kb. I have the expirience that does not work with jsPDF (see the beginning of my post).

The application

var fontInBase64 = '',
    fileName = '',
    message = document.querySelector('div'),
    txtForPdf = document.querySelector('textarea'),
    errorStr = '<b style="color:red">Please select a font file!</b>';

function readFile()
{
    var file = document.querySelector('input[type=file]').files[0],
        reader = new FileReader();

    if(file && file.name.split('.')[1].toLowerCase() != 'ttf')
    {
        message.innerHTML = errorStr;
        return;
    }

    if(txtForPdf.value.replace(/\s+/g, '').length < 1)
    {
        message.innerHTML = '<b style="color:red">Please write some Text!</b>';;
        return;
    }

    reader.onloadend = function()
    {
        fontInBase64 = reader.result.split(',')[1];
        fileName = file.name.replace(/\s+/g, '-');

        createPDF(fileName, fontInBase64);
    }

    if(file) reader.readAsDataURL(file);
    else message.innerHTML = errorStr;
}


function createPDF(fileName, fontInBase64)
{
    var doc = new jsPDF('p','mm','a4');
        fileNameWithoutExtension = fileName.split('.')[0],
        lMargin = 15, // left margin in mm
        rMargin = 15, // right margin in mm
        pdfInMM = 210; // width of A4 in mm

    doc.addFileToVFS(fileName, fontInBase64);
    doc.addFont(fileName, fileNameWithoutExtension, 'normal');

    doc.setFont(fileNameWithoutExtension);
    doc.setFontSize(14);
    var splitParts = doc.splitTextToSize(txtForPdf.value, (pdfInMM - lMargin - rMargin));
    doc.text(15, 15, splitParts);

    doc.save('test.pdf');
}

function setHindiToTextArea()
{
    txtForPdf.value =
    "हिन्दी विश्व की एक प्रमुख भाषा है एवं भारत की राजभाषा है। केंद्रीय स्तर पर भारत में दूसरी आधिकारिक भाषा अंग्रेजी है। यह हिन्दुस्तानी भाषा की एक मानकीकृत रूप है जिसमें संस्कृत के तत्सम तथा तद्भव शब्द का प्रयोग अधिक हैं और अरबी-फ़ारसी शब्द कम हैं। हिन्दी संवैधानिक रूप से भारत की प्रथम राजभाषा और भारत की सबसे अधिक बोली और समझी जाने वाली भाषा है। हालांकि, हिन्दी भारत की राष्ट्रभाषा नहीं है क्योंकि भारत का संविधान में कोई भी भाषा को ऐसा दर्जा नहीं दिया गया था। चीनी के बाद यह विश्व में सबसे अधिक बोली जाने वाली भाषा भी है। विश्व आर्थिक मंच की गणना के अनुसार यह विश्व की दस शक्तिशाली भाषाओं में से एक है। हिन्दी और इसकी बोलियाँ सम्पूर्ण भारत के विविध राज्यों में बोली जाती हैं। भारत और अन्य देशों में भी लोग हिन्दी बोलते, पढ़ते और लिखते हैं। फ़िजी, मॉरिशस, गयाना, सूरीनाम की और नेपाल की जनता भी हिन्दी बोलती है। 2001 की भारतीय जनगणना में भारत में ४२ करोड़ २० लाख लोगों ने हिन्दी को अपनी मूल भाषा बताया। भारत के बाहर, हिन्दी बोलने वाले संयुक्त राज्य अमेरिका में 648,983; मॉरीशस में ६,८५,१७०; दक्षिण अफ्रीका में ८,९०,२९२; यमन में २,३२,७६०; युगांडा में १,४७,०००; सिंगापुर में ५,०००; नेपाल में ८ लाख; जर्मनी में ३०,००० हैं। न्यूजीलैंड में हिन्दी चौथी सर्वाधिक बोली जाने वाली भाषा है";
}
<script src="https://cdnjs.cloudflare.com/ajax/libs/jspdf/1.4.1/jspdf.min.js" crossorigin="anonymous"></script>
<input type="file" onchange="message.innerHTML='&nbsp;'"><br><br>
<textarea rows="4" cols="75">‘→Kadıköy’</textarea>
<div>&nbsp;</div>
<input type="button" value="Create PDF with UTF support" onclick="readFile()">
<br>
<i>For example</i>:<br><a href="#" onclick="setHindiToTextArea()"><b>Click on this line if you wont to set hindi text to the textarea.</b></a>
Bharata
  • 13,509
  • 6
  • 36
  • 50
  • 1
    @Bonnard, in your case I would recommend you to use "Courier New" and not "Arial Unicode MS" because "Courier New" supports all your letters and "Arial Unicode MS" supports them too, but it is too big. "Courier New" you coul find in system fonts. Please copy this font in one other folder and then you will have access to this font by choosing this font. – Bharata Jul 22 '18 at 17:48
  • hello , how can use jspdf fo utf8 , for french special caracter , – Imen May 03 '19 at 13:30
4

You can make it with importing a font that supports your special characters.

From basic.js on examples you see reference how to apply it.

(Example brings cyrillic letters).

function demoUsingTTFFont() {
    //https://fonts.google.com/specimen/PT+Sans
    var PTSans = “...... “); // place long string of text here
    var doc = new jsPDF();

    doc.addFileToVFS("PTSans.ttf", PTSans);
    doc.addFont('PTSans.ttf', 'PTSans', 'normal');

    doc.setFont('PTSans'); // set font
    doc.setFontSize(10);
    doc.text("А ну чики брики и в дамки!", 10, 10);

    doc.save('test.pdf');
}

As a fontfamily, please have a look to Google's Noto.

Source:

https://github.com/MrRio/jsPDF/issues/12 (scroll to down)

mico
  • 12,730
  • 12
  • 59
  • 99
  • this is not fixing the issue at all. – Bonnard Jul 21 '18 at 01:21
  • Well, this was the only workaround I could find, there are many posts without this workaround and telling you to change the library in use. Maybe that is the case if my suggestion is useless in this scenario. – mico Jul 21 '18 at 09:59
  • I don'to understand: var PTSans = “...... “); // place long string of text here – educob Nov 22 '20 at 15:15
  • Look answer from @Igor above. There is a Fiddle example. – mico Nov 22 '20 at 16:18
3

imho, mico answer OK, only replace the font PTSans with the one you use (base64 encode). See jsfiddle: https://jsfiddle.net/o0m9pzyv/12/

var PTSans = ...
Igor
  • 263
  • 2
  • 5
  • 13