1

I am using the CSVJDBC driver for retrieving results from CSV files. All record fields are interpreted as Strings. How can I utilise the MAX aggregate function in order to obtain the maximim Integer of a column? As far as I know, csvjdbc does not support casting.

Consider this sample file:

sequenceNumber,decimalNumber,randomInteger,email,testNumber
0,0.4868176550817932,560801,cleta.stroman@gmail.com,0.0
1,0.9889360969432277,903488,chelsie.roob@hotmail.com,1.0
2,0.8161798688893893,367870,hardy.waelchi@yahoo.com,2.0
3,0.926163166852633,588581,rafaela.white@hotmail.com,3.0
4,0.05084859872223901,563000,belle.hagenes@gmail.com,4.0
5,0.7636864392027013,375299,joey.beier@gmail.com,5.0
6,0.31433980690632457,544036,cornell.will@gmail.com,6.0
7,0.4061012200967966,41792,catalina.kemmer@gmail.com,7.0
8,0.3541002754332119,196272,raoul.bogisich@yahoo.com,8.0
9,0.4189826302561652,798405,clay.roberts@yahoo.com,9.0
10,0.9076084714059381,135783,angel.white@yahoo.com,10.0
11,0.565716974613909,865847,marlin.hoppe@gmail.com,11.0
12,0.9484076609924861,224744,anjali.stanton@gmail.com,12.0
13,0.05223710002804138,977787,harley.morar@hotmail.com,13.0
15,0.6270851001160621,469901,eldora.schmeler@yahoo.com,14.0

I use the following code snippet:

import org.relique.jdbc.csv.CsvDriver;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.Statement;


public class CSVDemo
{
    public static void main(String[] args)
    {
    try
    {
        // Load the driver.
        Class.forName("org.relique.jdbc.csv.CsvDriver");

        // Create a connection. The first command line parameter is
        // the directory containing the .csv files.
        // A single connection is thread-safe for use by several threads.

        String CSVDIRECTORY = "/tmp/csv-directory/";
        String CSVDB ="mediumList";
        Connection conn = DriverManager.getConnection("jdbc:relique:csv:" + CSVDIRECTORY);

        // Create a Statement object to execute the query with.
        // A Statement is not thread-safe.
        Statement stmt = conn.createStatement();

        ResultSet results = stmt.executeQuery("SELECT MAX(decimalNumber) FROM "+CSVDB);

        // Dump out the results to a CSV file with the same format
        // using CsvJdbc helper function
        boolean append = true;
        CsvDriver.writeToCsv(results, System.out, append);

        // Clean up
        conn.close();
    }
    catch(Exception e)
    {
        e.printStackTrace();
    }
    }
}

When I execute the query

I get as expected:

MAX([DECIMALNUMBER])
0.9889360969432277

But when I want the maximum sequenceNumber, which is 19 with this

ResultSet results = stmt.executeQuery("SELECT MAX(sequenceNumber)   FROM  "+CSVDB);

I get 9 as a result:

MAX([SEQUENCENUMBER])
9

It works for the decimalNumber and it also works for the text. It does not work for testNumber, as csvjdbs returns the lexicographic largest value instead of the Integer value. Is there a possibility to solve this directly or to I need to fetch all records and select the maximum value with Java?

Basic Solution:

This is my basic solution, which needs to fetch all the numbers first:

        ResultSet results = stmt.executeQuery("SELECT sequenceNumber FROM "+CSVDB);
        int max=-1;

        while(results.next()){
            String sum = results.getString(1);

            int currentSeq = Integer.parseInt(sum);
            System.out.println("current_ "+sum);
            if(currentSeq>max){
                max=currentSeq;
            }

Is there a more elegant way?

Solution based on Joop Eggen

public int getMaxSequenceAggregate() {
       int max = 0;
       try {
           Properties props = new Properties();
           Connection connection;

           props.put("columnTypes", "Int,Double,Int,String,Int");
           connection = DriverManager.getConnection("jdbc:relique:csv:" + this.directoryPath, props);
           PreparedStatement statement = null;
           ResultSet result;
           statement = connection.prepareStatement("SELECT MAX(sequenceNumber) FROM " + this.filePath);
           result = statement.executeQuery();

           while (result.next()) {
               max = result.getInt(1);
               LOGGER.info("maximum sequence: " + max);

           }

           connection.close();
       } catch (SQLException e) {
           e.printStackTrace();
       }

       return max;
   }
Stefan
  • 679
  • 2
  • 9
  • 21
  • Data type? Numeric 19 > 9, but char '9' is > '19'. – jarlh Jul 01 '15 at 14:09
  • Yes, of course you are right. all values are interpreted as Strings in CSV. I did not notice immediately because the decimal example was working by accident. What a silly mistake. I reformulated the question. Thanks for your help. – Stefan Jul 01 '15 at 14:51

2 Answers2

2

You should better specify the column types as it seems the first column is taken as String, where "9" > "10".

Properties props = new Properties();
props.put("columnTypes", "Integer,Double,Integer,String,Integer");
Connection conn = DriverManager.getConnection("jdbc:relique:csv:" + CSVDIRECTORY, props);
Joop Eggen
  • 107,315
  • 7
  • 83
  • 138
0

As follows from the CSV/JDBC documentation:

If columnTypes is set to an empty string then column types are inferred from the data.

Which I guess is desirable in most use cases. So, using Joop Eggen's example this can be simplified as:

Properties props = new Properties();
props.put("columnTypes", "");
Connection conn = DriverManager.getConnection("jdbc:relique:csv:" + CSVDIRECTORY, props);

I tried this and it demonstrates dynamic type detection similar to other JDBC drivers. Wonder why this is not the default setting though.

Denis Abakumov
  • 355
  • 3
  • 11