7

I have a UTF-8 encoded file with contents in Tamil (an Indian Language). I have to read the contents of the file and make a PDF. I am using reportlab python module to do this.

I am able to open the file and read the contents and printing it to the terminal displays the contents perfectly. However, while writing the contents to PDF using reportlab, some characters (which are composite of two 'character symbols', the order gets reversed within the composite character. I have set a Tamil font for reportlab paragraph style. What am I missing?

from reportlab.pdfbase import pdfmetrics
from reportlab.lib.pagesizes import A4
from reportlab.lib.units import inch
from reportlab.platypus import SimpleDocTemplate, Paragraph, Spacer, PageBreak
from reportlab.lib.styles import ParagraphStyle, getSampleStyleSheet
from reportlab.lib.enums import TA_JUSTIFY
from reportlab.pdfbase.ttfonts import TTFont
pdfmetrics.registerFont(TTFont('Latha', '/home/srinivas/Fonts/latha/latha.ttf'))
from os import listdir
from os.path import isdir, isfile, join
import random
import codecs
from tamil import utf8 as tamil
PATH = 'tamil_file'
num_sets = 1
pages_per_set = 12
num_articles_per_page = 2

styles = getSampleStyleSheet()
styles.add(ParagraphStyle(name='CustomPara', fontName='Mangal', fontSize=14, alignment=TA_JUSTIFY, leading=24))

style = styles['CustomPara']
styleH = styles['Heading1']

for set_idx in range(num_sets):
    doc = SimpleDocTemplate(str(set_idx)+'.pdf', pagesize=A4)
    story = []
    for page in range(pages_per_set):
            story.append(Spacer(1, 0.1* inch))
            story.append(Paragraph(id, styleH))
            story.append(Spacer(1,  0.1 * inch))
            with codecs.open(join(PATH,selected_file),'r','utf-8') as f:
                for l in f.readlines():
                    print l # prints correctly in terminal
                    lines += l
            story.append(Paragraph(lines, style))
        story.append(PageBreak())
    doc.build(story)

Actual text: நாவல் மரத்தின் மருத்துவப் பயன்கள் போற்றத்தக்கவை

Saved wrong text: enter image description here

Note: If I copy the text from PDF and paste it here, it displays fine (wrong text is an image attachment)!

Srinivas
  • 332
  • 4
  • 18
  • it must be an issue with the pdf reader itself. Try other pdf readers and the modules/libraries for that reader. It should work. – SibiCoder Jan 01 '17 at 16:08
  • I tried uploading to Google Drive and it was the same. – Srinivas Jan 01 '17 at 16:51
  • Did you check whether the character mapping of the font you are using is correct, so did you try a different Tamil font? – B8vrede Jan 02 '17 at 15:12
  • 1
    @B8vrede Yes I did. In fact latha.ttf is a very standard tamil font. Nevertheless, I tried 3 different fonts. – Srinivas Jan 02 '17 at 15:51
  • In that case could you create a small example which will allow user (like me) that don't know how to write tamil to reproduce (and maybe fix) you problem? – B8vrede Jan 02 '17 at 18:32

0 Answers0