0

I need help on this... Especially since I don't know where to start.. I am an IT undergraduate and, along with my groupmates, is now undergoing on-the-job training in a company.

SCENARIO: The company asked us to create a program that will generate a report and store it in a database. The database that will be used is MySQL. As for what language to use, we are considering VB.Net, Java, PHP.

The program must be able to :

  1. generate a report that will be sent through email to an office
  2. store in a database
  3. collect all reports, collate those reports
  4. generate a new report which will then be sent to their main office
  5. then store it in their own databse...

For now, we are still trying to determine how the program will run and what language will be used that has the capability of reading and extracting data from a text file (can either be a word document or a PDF file).

The company also wants the program to be online-ready for future expansion.

Now, our problem is

  1. Is there a way to extract data from a PDF or Word file using either Java, PHP, VB then store it in the MySQL DB?
    • if there is, can it be implemented without using any 3rd party software?
    • the reason why we chose to use either a PDF or Word file type is that, the file should be printable for archive purposes.
  2. What programming language can we easily use to be able to achieve our problem above?

    I would like to apologize if the info I am giving is a bit messed up. I will be giving additional information once we are able to talk wth the company this week.

    If there is a problem with the way I posted this, please forgive me. I am just trying my best to provide you with the information the best I could.

Kara
  • 6,115
  • 16
  • 50
  • 57
user1468480
  • 1
  • 1
  • 1

2 Answers2

1

I'll answer for Java as it is what I use at work.

You can easily extract text from Word files or build a new Word file with Apache POI

As for PDF, iText or PDFBox both does a pretty nice job.

Olivier Coilland
  • 3,088
  • 16
  • 20
  • How about the CSV file format? I have read something about exporting the MySQL data and saving it in CSV format, however, I don't have a clue what this is and how this format works. I also don't know if the file is small enough to be email. Another thing is that we only want to export just part of the data in the DB. Any useful links will help.. thanks a lot – user1468480 Jun 26 '12 at 14:08
0

Why can't you use 3rd party software? If you could, I would recommend something like How to read PDF files using Java?.

Or, to read a .doc file: http://www.roseindia.net/tutorial/java/poi/readDocFile.html


Anyway, if you can't use 3rd party tools, why not read the specifications and figure out how to extract the text from PDF, DOC, and DOCX files?

Here you can find DOC specifications: http://msdn.microsoft.com/en-us/library/cc313118.aspx

Here you can find the PDF format specification: http://www.adobe.com/devnet/pdf/pdf_reference.html

Good luck!

Community
  • 1
  • 1
eboix
  • 5,113
  • 1
  • 27
  • 38
  • The reason why we can't use a 3rd party software is that we need to be able to automate the system. Although, we are still looking for other ways to export monthly data stored in the DB and import it in another DB in another location.. – user1468480 Jun 26 '12 at 14:10