8

I need library to read text from files such as doc, docx, xls, xlsx, hwp, pdf, etc. on c++.

I think there is a library to support this function but i have googled it and found none.

I'll keep searching but can somebody introduce any library to me?

Young Hyun Yoo
  • 598
  • 10
  • 21
  • 1
    Possibly related to: http://stackoverflow.com/questions/1161295/reading-docx-in-c – msmith81886 Apr 28 '13 at 18:03
  • 1
    What exactly do you want to do by "reading" the files. Just extracting the text content (e.g. for a search function) or actually be able to "understand" the document, or perhaps just display the content? I doubt there is one library that can read all of these filetypes, but there is code about that can read the file types you listed, to varying degree of accuracy and completeness. – Mats Petersson Apr 28 '13 at 18:15
  • I mean by 'reading' as just extracting the text content. Sorry for the confusion. I was not expecting a library but expected libraries. Thx for reading. – Young Hyun Yoo Apr 28 '13 at 18:20
  • This should work for xls/xlsx: http://www.libxl.com/ There is a "gnupdf" project for reading/understanding PDFS'. Bearin mind that none of these will simply give you the text, you will have to write code to extract the components into a format that you can use. – Mats Petersson Apr 28 '13 at 18:26

0 Answers0