0

I am doing a coding project where I am trying to input a file into java and output information about the file. I have found code online that does this for PDF's. The line "import org.xml.sax.SAXException;" keeps giving me an error and stating that the package org.xml.sax is accessible to more than one module. Can someone help me with this? Sorry to bother you all, I am a new coder just trying to figure this out.

Here is the code:

import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;

import org.apache.tika.exception.TikaException;
import org.apache.tika.metadata.Metadata;
import org.apache.tika.parser.ParseContext;
import org.apache.tika.parser.pdf.PDFParser;
import org.apache.tika.sax.BodyContentHandler;

import org.xml.sax.SAXException;

public class PDFTika 
{
   public static void main(final String[] args) throws 
      IOException,TikaException 
   {
      BodyContentHandler handler = new BodyContentHandler();
      Metadata metadata = new Metadata();
      FileInputStream inputstream = new FileInputStream(new 
      File("/Users/relli/OneDrive/Documents/Asparta/example.pdf"));
      ParseContext pcontext = new ParseContext();

      //parsing the document using PDF parser
      PDFParser pdfparser = new PDFParser(); 
      pdfparser.parse(inputstream, handler, metadata, pcontext);

      //getting the content of the document
      System.out.println("Contents of the PDF :" + 
      handler.toString());

      //getting metadata of the document
      System.out.println("Metadata of the PDF:");
      String[] metadataNames = metadata.names();

      for(String name : metadataNames) 
      {
         System.out.println(name+ " : " + metadata.get(name));
      }
       }
    }
  • This happens when you have added the external jars in the ModulePath. – Shovan Jun 18 '19 at 06:50
  • 1
    why do you need this line? – Scary Wombat Jun 18 '19 at 06:50
  • @Shovon Das, what do you mean by that? I thought I had to add the external jars in the ModulePath in order for TIKA and JNotify to be used in Eclipse. If not, should I remove them? On my partner's computer, we added the external jars to their library and it was not a problem. – Gabriel Katz Jun 18 '19 at 06:53
  • @ScaryWombat, I think I need the line to be able to throw a SAXException. When I just commented out the line, I simply ran into problems of using an unhandled exception type – Gabriel Katz Jun 18 '19 at 06:54
  • I can not see where you are catching or throwing this exception, please show the error. By the way, it is probably being included in multiple places on your class path? – Scary Wombat Jun 18 '19 at 07:12
  • Check this, It may help you https://stackoverflow.com/questions/46834695/the-package-org-openqa-selenium-is-accessible-from-more-than-one-module – Shovan Jun 18 '19 at 07:21
  • @ScaryWombat, I am happy to show the error. Can I just take a screenshot of my screen and show you. What do you mean when you say that "it is probably being included in multiples places" on my class path? I am sorry for all the questions, I am just new to this – Gabriel Katz Jun 18 '19 at 07:23
  • Yeah a screenshot is OK. the class path is the external jars that you mentioned before. – Scary Wombat Jun 18 '19 at 07:26
  • @ShovonDas, I did what you recommended and now the main method is telling me that SAXException is an unhandled exception type – Gabriel Katz Jun 18 '19 at 07:30
  • @ScaryWombat, here is a link to a screenshot. The external jars are now in my class path and the JRE System Library is in my module path. file:///Users/gabrielkatz/Desktop/Screen%20Shot%202019-06-18%20at%2010.30.48%20AM.png – Gabriel Katz Jun 18 '19 at 07:33
  • @GabrielKatz - You are new at this aren't you. Sorry, but I can not access your Desktop PC. – Scary Wombat Jun 18 '19 at 07:46
  • @ScaryWombat, I am! Sorry! The computer is now telling me that the line "pdfparser.parse(inputstream, handler, metadata, pcontext);" gives me an unhandled exception type. The computer says that the unhandled exception type is a SAXException. I am happy to send a picture if I can figure out how – Gabriel Katz Jun 18 '19 at 07:57
  • @GabrielKatz the only thing that an import statement does is to allow you to use a shorthand name in the code. Nothing more than that. In this case it allows you to say `public static void main(String[] args) throws SAXException` instead of fully spelling it out as `public static void main(String[] args) throws org.xml.sax.SAXException`. But the problem is not the import statement itself but the fact you are not declaring the exception in the `throws` statement. When you do, you'll also get the error saying the package is defined in multiple modules (which others have already helped you with). – Klitos Kyriacou Jun 18 '19 at 08:41

1 Answers1

0

Method 1: code is a copy of the code provided by Gabriel Katz. I have managed to fix the error just by adding another exception (SAXException) in code.

Method 2: is a simplified version of parsing the PDF content only.

Code Snippet Info:

This code is used to parse PDF data using the Apache Tika package. It will display the pdf content as string and print metadata of PDF file

Method 1: parse PDF and print PDF content and metadata

import org.apache.tika.exception.TikaException;
import org.apache.tika.metadata.Metadata;
import org.apache.tika.parser.ParseContext;
import org.apache.tika.parser.pdf.PDFParser;
import org.apache.tika.sax.BodyContentHandler;
import org.xml.sax.SAXException;

import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;


public class PDFTika {
    public static void main(final String[] args) throws IOException, TikaException, SAXException {
        File file = new File("example.pdf");
        FileInputStream inputstream = new FileInputStream(file);

        BodyContentHandler handler = new BodyContentHandler();
        Metadata metadata = new Metadata();
        ParseContext pcontext = new ParseContext();

        //parsing the document using PDF parser
        PDFParser pdfparser = new PDFParser();
        pdfparser.parse(inputstream, handler, metadata, pcontext);

        //getting the content of the document
        System.out.println("Contents of the PDF :" + handler.toString());

        //getting metadata of the document
        System.out.println("Metadata of the PDF:");
        String[] metadataNames = metadata.names();

        for (String name : metadataNames) {
            System.out.println(name + " : " + metadata.get(name));
        }
    }
}

Method 2: parse PDF data and print content as a string

import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;

import org.apache.tika.Tika;
import org.apache.tika.exception.TikaException;

public class TikaParser {
    public static void main(String[] args) throws IOException, TikaException {
        File file = new File("example.pdf");
        FileInputStream inputstream = new FileInputStream(file);
        Tika tika = new Tika();
        String fileContent = tika.parseToString(inputstream);
        System.out.println(fileContent);
    }
}

<!--Please add following dependencies for testng-->
    <dependencies>
        <dependency>
            <groupId>org.apache.tika</groupId>
            <artifactId>tika-parsers</artifactId>
            <version>1.24.1</version>
        </dependency>
        <dependency>
            <groupId>org.apache.tika</groupId>
            <artifactId>tika-core</artifactId>
            <version>1.24.1</version>
        </dependency>
    </dependencies>
Aman Srivastava
  • 1,007
  • 1
  • 13
  • 25
utkarshp64
  • 16
  • 1
  • 4