I'm working on an PDF. I need to extract Japanese text only from a PDF file. And then save it into my database in type of string.
I've searched on Stack Overflow and page 4 of Google but cannot find a solution.
I'm trying on pdfparser
of SMALOT at github.com/smalot/pdfparser
but it just shows unreadable characters (image)
Eg:
\w ���w ��� � /�����yyy/� �Fq�J�yyy/�S�M��dyyy/�q� �Cyyy/�>; �Cyyy/��������yyy/�]b;tKh�yyy/��� ����y /����yyyy/� �� �Cyyyyy/� � a ���yyyy/���� wyyy/� a �Ugyyy/����� e{yyyy/�2�" Copyright(c)2014 Daiichikizai.,Co.,Ltd All rights reserved.
I'm using Yii framework
, PHP 5.5
I tried utf-encode()
, utf-decode()
, mb_convert_encoding()
, but nothing works.
UPDATE: I tried mb_detect_encoding()
and it return UTF-8
. So maybe not a encoding problem here.
Any suggestions would be deeply appreciated.