1NF
First of all, I think your table design is wrong because it isn't 1NF compliant. Every field should only contain atomic attributes, but that's not the case. Why not a table like:
CREATE TABLE my_table (
id,
ip inet,
port int
)
Where id
is the number of your line in the source file and ip
/port
one of the adresses in this line?
Sample data:
id | ip | port
-----------------------
1 | 10.10.10.1 | 80
1 | 10.10.10.2 | 443
2 | 10.10.10.3 | 8080
2 | 10.10.10.4 | 4040
...
Hence, you will be able to query your database on single address (find all the associated adresses, return true if two adresses are on the same line, whatever else you might want...).
Load the data
But let's assume you know what you are doing. The main issue here is that your input data file is in a special format. It might be one single column CSV file, but it would be a very degenerated CSV file. Anyway, you have to transform the lines before you insert them into the database. You have two options:
- you read each line of the input file and you make an
INSERT
(this may take a while);
- you convert the input file into a text file with the expected format and use
COPY
.
Insert one by one
The first options seems easy: for the first row of the csv file, {(10.10.10.1,80),(10.10.10.2,443)}
, you have to run the query:
INSERT INTO my_table VALUES (ARRAY[('10.10.10.1',80),('10.10.10.2',443)]::address[], 4)
To do so, you just have to create a new string:
String value = row.replaceAll("\\{", "ARRAY[")
.replaceAll("\\}", "]::address[]")
.replaceAll("\\(([0-9.]+),", "'$1'");
String sql = String.format("INSERT INTO my_table VALUES (%s)", value);
And execute the query for every line of the input file (or for a better security, use a prepared statement).
Insert with COPY
I will elaborate on the second option. You have to use in Java code:
copyManager.copyIn(sql, from);
Where copy query is a COPY FROM STDIN
statement and from
is a reader. The statement will be:
COPY my_table (addresses) FROM STDIN WITH (FORMAT text);
To feed the copy manager, you need data like (note the quotes):
{"(10.10.10.1,80)","(10.10.10.2,443)"}
{"(10.10.10.3,8080)","(10.10.10.4,4040)"}
With a temporary file
The simpler way to get the data in the right format is to create a temporary file. You read each line of the input file and replace (
by "(
and )
by )"
. Write this processed line into a temporary file. Then pass a reader on this file to the copy manager.
On the fly
With two threads
You can use two threads:
thread 1 reads the input file, processes the lines one by one and writes them into a PipedWriter
.
thread 2 passes a PipedReader
connected to the previous PipedWriter
to the copy manager.
The main difficulty is to sychronize the threads in such a way that thread 2 starts to read the PipedReader
before thread 1 starts to write data into the PipedWriter
. See this project of mine for an example.
With a custom reader
The from
reader could be an instance of something like (naive version):
class DataReader extends Reader {
PushbackReader csvFileReader;
private boolean wasParenthese;
public DataReader(Reader csvFileReader) {
this.csvFileReader = new PushbackReader(csvFileReader, 1);
wasParenthese = false;
}
@Override
public void close() throws IOException {
this.csvFileReader.close();
}
@Override
public int read(char[] cbuf, int off, int len) throws IOException {
// rely on read()
for (int i = off; i < off + len; i++) {
int c = this.read();
if (c == -1) {
return i-off > 0 ? i-off : -1;
}
cbuf[i] = (char) c;
}
return len;
}
@Override
public int read() throws IOException {
final int c = this.csvFileReader.read();
if (c == '(' && !this.wasParenthese) {
this.wasParenthese = true;
this.csvFileReader.unread('(');
return '"'; // add " before (
} else {
this.wasParenthese = false;
if (c == ')') {
this.csvFileReader.unread('"');
return ')'; // add " after )
} else {
return c;
}
}
}
}
(This is a naive version because the right way to do it would be to override only public int read(char[] cbuf, int off, int len)
. But you should then process the cbuf
to add the quotes and store the extra chars pushed to the right: this is a bit tedious).
Now, if r
is the reader for the file:
{(10.10.10.1,80),(10.10.10.2,443)}
{(10.10.10.3,8080),(10.10.10.4,4040)}
Just use:
Class.forName("org.postgresql.Driver");
Connection connection = DriverManager
.getConnection("jdbc:postgresql://db_host:5432/db_base", "user", "passwd");
CopyManager copyManager = connection.unwrap(PGConnection.class).getCopyAPI();
copyManager.copyIn("COPY my_table FROM STDIN WITH (FORMAT text)", new DataReader(r));
On bulk loading
If you are loading a huge amount of data, don't forget the basic tips: disable autocommit, remove indexes and constraints, and use TRUNCATE
and ANALYZE
as follows:
TRUNCATE my_table;
COPY ...;
ANALYZE my_table;
This will speed up the loading.