0

I need to parse the standard pcap binary log files library for MR (MapReduce) jobs already available in Git here

I also saw a sample here

ClassNotFoundException: p3.hadoop.mapreduce.lib.input.PcapInputFormat

I see the above exception when I ran the sample class.

My pom looks like

    <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-jar-plugin</artifactId>
            <version>2.4</version>
            <configuration>
                <archive>
                    <manifest>
                        <addClasspath>true</addClasspath>
                        <classpathPrefix>lib/</classpathPrefix>
                        <mainClass>com.name.mr.analytics.main.NetworkAnalytics</mainClass>
                    </manifest>
                </archive>
            </configuration>
        </plugin>

        <plugin>
            <artifactId>maven-assembly-plugin</artifactId>
            <configuration>
                <archive>
                    <manifest>
                        <mainClass></mainClass>
                    </manifest>
                </archive>
                <descriptorRefs>
                    <descriptorRef> jar-with-dependencies </descriptorRef>
                </descriptorRefs>
            </configuration>
            <executions>
                <execution>
                    <id>make-assembly</id>
                    <phase>package</phase>
                    <goals>
                        <goal>single</goal>
                    </goals>
                </execution>
            </executions>
        </plugin>

I am packaging all the necessary jars but some how Maven is overwriting the manifest.mf file and I can see only the classpath or main-class in manifest.mf of jar-with-dependencies. However in the other jar that is getting created has all the details in manifest.mf but obviously the dependencies are not available.

halfer
  • 19,824
  • 17
  • 99
  • 186
Raghuveer
  • 2,859
  • 7
  • 34
  • 66
  • your `` is empty, set it according to the main class of your job. Also please attach the list of contents of the jar file you get as output (`jar tvf `) – 0x0FFF Nov 05 '14 at 15:06

1 Answers1

0

The problem you face is related to the fact that this class cannot be found on the nodes that are executing your mapreduce job. You have 3 main options on how to make it work:

  1. Pack the hadoop-pcap classes into your jar file (can easily be done using GUI IDE)
  2. Put the jar file containing hadoop-pcap classes on each of the cluster nodes into one of the CLASSPATH directories (see yarn.application.classpath)
  3. Put the jar file containing hadoop-pcap into HDFS (hdfs dfs -put hadoop_pcap.jar <hdfs path>) and use job.addFileToClassPath call for the jar file with hadoop-pcap package to be shipped together with your job to all the nodes executing it

For production usage I'd recommend solution #2, while you are still developing code I'd recommend to try #1 first

0x0FFF
  • 4,948
  • 3
  • 20
  • 26
  • updated the post. I have tried adding main-class to assembly-plugin it doesnt work. – Raghuveer Nov 05 '14 at 14:57
  • please, list the contents of the jar file you get and project structure – 0x0FFF Nov 05 '14 at 15:45
  • also here's a list of options on how to include dependencies while building the project with maven: http://stackoverflow.com/questions/574594/how-can-i-create-an-executable-jar-with-dependencies-using-maven – 0x0FFF Nov 05 '14 at 15:52