2

I managed to extract text from PDF version 1.2 by using PdfSharp as refer to this link

My code to extract text

private string ExtractText(CObject cObject, ref string pdfcontentstr)
    {
        if (cObject is COperator)
        {
            var cOperator = cObject as COperator;
            if (cOperator.OpCode.Name == OpCodeName.Tj.ToString() ||
                cOperator.OpCode.Name == OpCodeName.TJ.ToString())
            {
                foreach (var cOperand in cOperator.Operands)
                {
                    ExtractText(cOperand, ref pdfcontentstr);
                }
            }
        }
        else if (cObject is CSequence)
        {
            var cSequence = cObject as CSequence;
            foreach (var element in cSequence)
            {
                ExtractText(element, ref pdfcontentstr);
            }
        }
        else if (cObject is CString)
        {
            var cString = cObject as CString;
            pdfcontentstr = pdfcontentstr + ";" + cString.Value;
        }
        return pdfcontentstr;
    }

But when i try to extract PDF version 1.3 (with same content), the program return unreadable content, example:

0%0O0R0F0N00%0

The actual content in PDF file: Block B

Anyone can help? Thanks in advance.

Community
  • 1
  • 1
Soon Khai
  • 652
  • 6
  • 13
  • Links are frowned upon due to their changing nature. Please include the code you've tried in your question. – STLDev Jan 06 '17 at 02:26
  • Can you share the original text which should have been displayed instead of the unreadable? – Lara Jan 06 '17 at 07:42
  • Could it be that extracting works for ANSI fonts but not for Unicode fonts? With PDF nothing is really simple and simple solutions only work for some PDF files. And you do not even provide PDF files. – I liked the old Stack Overflow Jan 06 '17 at 16:59
  • Sorry i can't provide the PDF file, because it contains P&C data. – Soon Khai Jan 09 '17 at 02:48
  • I will try to find out whether it is related to - "Could it be that extracting works for ANSI fonts but not for Unicode fonts" – Soon Khai Jan 09 '17 at 02:49

0 Answers0