0

I have written a Java program to do the following and would like opinions on my design:

  1. Read data from a CSV file. The file is a database dump with 6 columns.
  2. Write data into a MySQL database table.

The database table is as follows:

    CREATE TABLE MYTABLE
    (
   ID int PRIMARY KEY not null auto_increment,
   ARTICLEID int,
   ATTRIBUTE varchar(20),
   VALUE text,
   LANGUAGE smallint,
   TYPE smallint
    );
  1. I created an object to store each row.
  2. I used OpenCSV to read each row into a list of objects created in 1.
  3. Iterate this list of objects and using PreparedStatements, I write each row to the database.

The solution should be highly amenable to the changes in requirements and demonstrate good approach, robustness and code quality.

Does that design look ok?

Another method I tried was to use the 'LOAD DATA LOCAL INFILE' sql statement. Would that be a better choice?

EDIT: I'm now using OpenCSV and it's handling the issue of having commas inside actual fields. The issue now is nothing is writing to the DB. Can anyone tell me why?

public static void exportDataToDb(List<Object> data) {
    Connection conn = connect("jdbc:mysql://localhost:3306/datadb","myuser","password");

    try{
        PreparedStatement preparedStatement = null;
        String query = "INSERT into mytable (ID, X, Y, Z) VALUES(?,?,?,?);";
        preparedStatement = conn.prepareStatement(query);

        for(Object o : data){   
            preparedStatement.setString(1, o.getId());
            preparedStatement.setString(2, o.getX());
            preparedStatement.setString(3, o.getY());
            preparedStatement.setString(4, o.getZ());
        }
        preparedStatement.executeBatch();

    }catch (SQLException s){
        System.out.println("SQL statement is not executed!");
    }
}
TheCoder
  • 8,413
  • 15
  • 42
  • 54
  • 2
    Reinventing the wheel once again. MySQL could import CSV data directly: http://stackoverflow.com/questions/3025648/import-csv-to-mysql I don't get why you would set up a single line of Java code to import a CSV into a database… – feeela Jan 23 '13 at 08:30
  • @feeela This task is for a job application. Do you think your suggestion is stopping me from demonstrating my coding skills or does it show that I'm aware of external tools? – TheCoder Jan 23 '13 at 08:39
  • What is the (approximate) expected size of your source file? – RandomSeed Jan 23 '13 at 08:43
  • @YaK It's not specified. I presume I have to take into account the fil could get quite large - that my app can handle that. – TheCoder Jan 23 '13 at 08:45
  • What was the exact question? If the question was "How to import a CSV into MySQL", than using a separate program to do so is a huge overkill and thus a wrong answer. – feeela Jan 23 '13 at 11:35
  • @feeela The question was 'Write a Java application that reads the database dump and then inserts the data back into a database'. Because my csv file contains commas in fields, I'll cannot use LOAD DATA. Do you agree? – TheCoder Jan 23 '13 at 12:10
  • Of course you can use any character to separate the columns. Just view the question/answers I've linked above. – feeela Jan 23 '13 at 13:19
  • @feeela I've read the linked page but cannot do d a solution. It doesn't mention help on files that have commas in the fields themselves. Can you please direct me. I lost in this stuff. – TheCoder Jan 23 '13 at 15:04
  • It's in the accepted answer: `FIELDS TERMINATED BY ','` – feeela Jan 23 '13 at 15:16
  • I have tried that and it doesn't work. One of the columns has a comma in it. The app thinks the comma denotes a new column whereas its actually a comma with a column. – TheCoder Jan 23 '13 at 15:27
  • E.g. 1, blogs, joe, 30. The app thinks that is 4 columns. It's not. It's only 3. – TheCoder Jan 23 '13 at 15:29
  • Well you'll lose marks for not closing that connection! – James Bassett Jan 24 '13 at 04:02

2 Answers2

1

From a purely algorithmic perspective, and unless your source CSV file is small, it would be better to

  1. prepare your insert statement
  2. start a transaction
  3. load one (or a few) line(s) from it
  4. insert the small batch into your database
  5. return to 3. while there are some lines remainig
  6. commit

This way, you avoid loading the entire dump in memory.

But basically, you probably had better use LOAD DATA.

RandomSeed
  • 29,301
  • 6
  • 52
  • 87
  • I'm going down the LOAD DATA route. My issue now is that the app is inserting extra rows in the table; rows not in the CSV file. Any idea why this is happening? I suspect it's to do with the auto increment of the ID field. – TheCoder Jan 23 '13 at 10:21
  • @user999353 If the unwanted rows are empty, you probably have a problem with the end-of-line (to be defined with a `LINES TERMINATED BY` clause). Be aware that EOL are system-dependant. – RandomSeed Jan 23 '13 at 10:31
  • You're correct. My line terminator is a comma and one of the columns is a sentence with a comma in it. Any suggestions how to get round this? – TheCoder Jan 23 '13 at 10:36
  • @user999353 Then the CSV file is malformed, comas should be escaped with a backslash. – RandomSeed Jan 23 '13 at 10:48
  • So I should use the ESCAPED keyword? – TheCoder Jan 23 '13 at 10:50
  • @heyya99 If the CSV file uses something else but the standard backslash as an escape character, yes. Otherwise, the `ESCAPE` clause is optional. – RandomSeed Jan 23 '13 at 10:58
  • I'm a little confused. My csv file is split by commas. Some actual fields have commas inside. Isn't it impossible for it to know which comma is a split and which is a comma in a normal sentence? – TheCoder Jan 23 '13 at 11:03
  • @heyya99 Indeed, this is impossible. Your CSV file is flawed, it should include escape characters. – RandomSeed Jan 23 '13 at 11:03
  • Ok. So I'll have to abandon the LOAD FILE idea. Would you recommend openCSV? – TheCoder Jan 23 '13 at 11:05
  • @heyya99 You will certainly meet the same problem with any parser (how could openCSV know how to discriminate a coma inside a filed and a coma as separator). You'd better correct the CSV generator. I do not know openCSV. – RandomSeed Jan 23 '13 at 11:11
0

If the no. of rows is huge, then the code will fail at Step 2 with out of memory error. You need to figure out a way to get rows in chunks and perform a batch with prepared statement for that chunk, continue till all the rows are processed. This will work for any no. of rows and also the batching will improve performance. Other than this I don't see any issue with the design.

Abdullah Shaikh
  • 2,567
  • 6
  • 30
  • 43