1

I have quite a lot of data in JSON files that I need to convert to the SQLite database. I did a simple loop with the INSERT query, but the queries take a lot of time. I am a hobbyist and I am learning all the time. I think that there must be a more professional solution.

use JSON::XS;
use DBI;

my $CCC = 'string';

# start connection to SQLite
my $dbh = DBI->connect(          
"dbi:SQLite:dbname=aaaa.db", "", "", { RaiseError => 1 }, ) or die $DBI::errstr;

my $stmt = "CREATE TABLE IF NOT EXISTS $CCC (id INTEGER PRIMARY KEY AUTOINCREMENT, start INTEGER UNIQUE, open REAL NOT NULL, high REAL NOT NULL, low REAL NOT NULL, close REAL NOT NULL, vwp REAL NOT NULL, volume REAL NOT NULL, trades INTEGER NOT NULL)";
my $sth = $dbh->prepare( $stmt );
my $rv = $sth->execute() or die $DBI::errstr;

if($rv < 0) {
  print $DBI::errstr;
}

# Open file.json
open(my $fh, '<', file.json) or die "cannot open file";
{
  local $/;
  $data = <$fh>;
}

# Converting JSON format to Perl's variables
my $coder = JSON::XS->new->ascii->pretty->allow_nonref;
my $json = $coder->decode ($data);

# Loop. Im inserting every hash ({}) from file.json by INSERT sql statement
foreach (@{$json}) {
  my $stmt = "INSERT OR IGNORE INTO $CCC values (null, strftime('%s',\'$_->{T}\'), $_->{O}, $_->{H}, $_->{L}, $_->{C}, ".(($_->{H}+$_->{L}+$_->{C}+$_->{O}) / 4).", $_->{V}, 1)";
  my $sth = $dbh->prepare( $stmt );
  my $rv = $sth->execute() or die $DBI::errstr;

  if($rv < 0) {
    print $DBI::errstr;
  }
}

file.json

[{"O": 1.0, "H": 1.0, "L": 0.00014, "C": 0.000145, "V": 176703.92394752, "T": "2018-02-16T00:00:00", "BV": 25.71390226}, {"O": 0.00014499, "H": 0.00014499, "L": 0.00011101, "C": 0.00012599, "V": 247646.2068748, "T": "2018-02-16T00:05:00", "BV": 30.66246148}, {"O": 0.00012599, "H": 0.0001295, "L": 0.000122, "C": 0.00012699, "V": 102563.86201627, "T": "2018-02-16T00:10:00", "BV": 12.88322597}]

Do you know any more efficient way?

J. Doe
  • 71
  • 1
  • 6
  • People have experimented on this kind of thing in the past eg https://stackoverflow.com/questions/1711631/improve-insert-per-second-performance-of-sqlite – Chris Turner Apr 20 '18 at 13:19
  • 1
    I had an answer [here](https://stackoverflow.com/questions/33556610/dbdsqlite-fastest-way-to-insert-multiple-rows/33557453#33557453) that solved this problem. – Chris Charley Apr 20 '18 at 15:34
  • It works! The query from autocommit = 0 ended in a minute when the other two attempts (without this parameter) were running for 30 minutes and were still unfinished. Out of curiosity, I will wait for other ideas. Should there be a difference in performance between your loop and mine? – J. Doe Apr 20 '18 at 15:48
  • 1
    @J.Doe: Executing `PRAGMA journal_mode = MEMORY` should make a significant difference. – Borodin Apr 20 '18 at 17:28

1 Answers1

4

I think you should have the statement and prepare outside the foreach loop (using placeholders), then execute inside the loop. The prepare normally only needs to be done once.

foreach (@{$json}) {
  my $stmt = "INSERT OR IGNORE INTO $CCC values (null, strftime('%s',\'$_->{T}\'), $_->{O}, $_->{H}, $_->{L}, $_->{C}, ".(($_->{H}+$_->{L}+$_->{C}+$_->{O}) / 4).", $_->{V}, 1)";
  my $sth = $dbh->prepare( $stmt );
  my $rv = $sth->execute() or die $DBI::errstr;

  if($rv < 0) {
    print $DBI::errstr;
  }
}

Perhaps this:

my $stmt = "INSERT INTO $CCC values (?, ?, ?, ?, ?, ?, ?, ?, ?)";
my $sth = $dbh->prepare( $stmt );

foreach (@{$json}) {
    $sth->execute(undef, $_->{T}, $_->{O}, $_->{H}, $_->{L}, $_->{C}, ($_->{H}+$_->{L}+$_->{C}+$_->{O}) / 4, $_->{V}, 1) or die $DBI::errstr;
}

Update: placed undef as first parameter as suggested by Kjetil S.

Chris Charley
  • 6,403
  • 2
  • 24
  • 26