0

I want to execute Select query on CSV file using csvJDBC with MapReduce.

I'm using the following code in a Map-Only function, but the output file is empty:

         public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {

        // Load the driver.
            try {
                Class.forName("org.relique.jdbc.csv.CsvDriver");
                // Create a connection. The first command line parameter is
                // the directory containing the .csv files.
               String csv1 = csv.substring(0, csv.indexOf("file.csv"));
                Connection conn = DriverManager.getConnection("jdbc:relique:csv:" + csv1);

                // Create a Statement object to execute the query with.
                // A Statement is not thread-safe.
                Statement stmt = conn.createStatement();


                ResultSet results = stmt.executeQuery(input);

                // Dump out the results to a CSV file with the same format
                // using CsvJdbc helper function

                while (results.next())
                  {

                      out.set(results.getString("id")); 
                      context.write( out, new IntWritable(1));
                  }

                // Clean up
                conn.close();
            } catch (ClassNotFoundException | SQLException e) {
                // TODO Auto-generated catch block
                e.printStackTrace();
            }
  • How are you submitting your map-reduce job ? Moreover how have you configured input to your map job ? – Amit Sep 01 '16 at 14:45
  • I'm using the statement as follow to execute Map job `hadoop jar /projets/test/exp.jar expl.Csvclass files/file.csv fileoutput select place from file` where the `input` variable will take the query `select place from file` from the command line – user2765117 Sep 01 '16 at 14:58
  • And do you have records in "files/file.csv" – Amit Sep 01 '16 at 15:00
  • yes I have records the output is: `Map-Reduce Framework Map input records=47 Map output records=0 Input split bytes=119 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=56 CPU time spent (ms)=860` – user2765117 Sep 01 '16 at 20:37
  • have you tried putting any debugging statements in map() just to make sure it indeed is getting called ? – Amit Sep 02 '16 at 12:30
  • `INFO mapreduce.Job: Job job_1469547325715_0042 running in uber mode : false 16/09/03 11:08:27 INFO mapreduce.Job: map 0% reduce 0% 16/09/03 11:08:36 INFO mapreduce.Job: map 100% reduce 0% 16/09/03 11:08:36 INFO mapreduce.Job: Job job_1469547325715_0042 completed successfully 16/09/03 11:08:36 INFO mapreduce.Job: Counters: 30 ` – user2765117 Sep 03 '16 at 09:09
  • when I make the execution query into the main class i have recieved the following message: `ClassNotFoundException: org.relique.jdbc.csv.CsvDriver` . the program didn't accept the csvJDBC !!! – user2765117 Sep 03 '16 at 11:37
  • Look up the logs for job id mentioned above to see if it also shows the exception stack trace. Moreover are you bundling the required JAR in hadoop jar as well. Also try putting some system.out or logger statements in your method to trace the error. – Amit Sep 03 '16 at 13:30

0 Answers0