I have a 20gb text file that i would like to read and store the data into a database. The problem is when I try to load it before it can print out anything to see what the program is doing it is terminated, and it seems like it might be due to the size of the file. If anyone has any suggestions on how to read this file efficiently please show me.
-
1store **HOW**? as a blob? tear it apart into individual fields/records? – Marc B Jul 15 '15 at 21:21
-
3Sounds like you're trying to load the whole file and only then handle it. Storing 20GB in-memory is not the right thing to do - read the file, line by line, and store it. you can store bulks of a few lines at a time - but not the whole file. – Nir Alfasi Jul 15 '15 at 21:25
-
Is there a class that will split up the files into smaller chunks? That way it can be read. – Jay Jul 15 '15 at 21:34
-
possible duplicate of [Read large files in Java](http://stackoverflow.com/questions/2356137/read-large-files-in-java) – o11c Jul 16 '15 at 00:09
3 Answers
From another post Read large files in Java
First, if your file contains binary data, then using BufferedReader would be a big mistake (because you would be converting the data to String, which is unnecessary and could easily corrupt the data); you should use a BufferedInputStream instead. If it's text data and you need to split it along linebreaks, then using BufferedReader is OK (assuming the file contains lines of a sensible lenght).
Regarding memory, there shouldn't be any problem if you use a decently sized buffer (I'd use at least 1MB to make sure the HD is doing mostly sequential reading and writing).
If speed turns out to be a problem, you could have a look at the java.nio packages - those are supposedly faster than java.io,
As for reading it to a database, make sure you make use of some sort of bulk loading API otherwise it would take forever.
Here is an example of a bulk loading routine I use for Netezza ...
private static final void executeBulkLoad(
Connection connection,
String schema,
String tableName,
File file,
String filename,
String encoding) throws SQLException {
String filePath = file.getAbsolutePath();
String logFolderPath = filePath.replace(filename, "");
String SQLString = "INSERT INTO " + schema + "." + tableName + "\n";
SQLString += "SELECT * FROM\n";
SQLString += "EXTERNAL '" + filePath + "'\n";
SQLString += "USING\n";
SQLString += "(\n";
SQLString += " ENCODING '" + encoding + "'\n";
SQLString += " QUOTEDVALUE 'NO'\n";
SQLString += " FILLRECORD 'TRUE'\n";
SQLString += " NULLVALUE 'NULL'\n";
SQLString += " SKIPROWS 1\n";
SQLString += " DELIMITER '\\t'\n";
SQLString += " LOGDIR '" + logFolderPath + "'\n";
SQLString += " REMOTESOURCE 'JDBC'\n";
SQLString += " CTRLCHARS 'TRUE'\n";
SQLString += " IGNOREZERO 'TRUE'\n";
SQLString += " ESCAPECHAR '\\'\n";
SQLString += ");";
Statement statement = connection.createStatement();
statement.execute(SQLString);
statement.close();
}

- 1
- 1

- 1,506
- 10
- 16
-
Do you have any resources for a bulk loading api or previous questions that could be of help? – Jay Jul 15 '15 at 21:36
If you need to load the information into a database you can use Spring batch, with this you are going to read your file, manage transaction, execute process over your file, persist your rows into a database, control how much records you are going to execute a commit, I think that is a better option because the first problem is to read the large file, but your next problem will be to manage the transaction of your database, control the commits, etc. I hop It help you

- 1,344
- 1
- 15
- 35
If you are reading very huge file, always prefer InputStreams. e.g.
BufferedReader in = new BufferedReader(new InputStreamReader(conn.getInputStream()));
String line = null;
StringBuilder responseData = new StringBuilder();
while((line = in.readLine()) != null) {
// process line
}

- 814
- 8
- 7