-2

I am currently working on a program, that reads Data out of a huge (40,000 lines+) *.txt File. the lines look like this:

4,Barbarendorf,505,552,1575899232,378,0
5,Der+letzte+macht+das+Licht+aus,484,458,1576458064,5757,0

in general:

$id, $name, $x, $y, $player, $points, $rank

So the function i wrote in order to get this data into the SQLite database is the following:

void ThreadMethod()
{
    string sql = "";
    SQLiteConnection m_dbConnection;
    m_dbConnection = new SQLiteConnection("Data Source=villages.db;Version=3;");
    m_dbConnection.Open();
    SQLiteCommand command = new SQLiteCommand(sql, m_dbConnection);
    try
    {
        using (StreamReader sr = new StreamReader("village.txt"))
        {
            String line;
            while ((line = File.ReadLines("village.txt").Last()) != null)
            {
                Regex regex = new Regex(",");
                string[] substrings = regex.Split(line);
                int i = 0;
                string[] strVillage = new string[7];
                foreach (string match in substrings)
                {
                    strVillage[i++] = match;
                }
                sql = "INSERT INTO villages (villageID, villageName, xCoord, yCoord, playerName, villagePoints, villageRank) values (" + strVillage[0] + ", '" + strVillage[1] + "', " + strVillage[2] + ", " + strVillage[3] + ", '" + strVillage[4] + "', " + strVillage[5] + ", " + strVillage[6] + ")";
                command = new SQLiteCommand(sql, m_dbConnection);
                command.ExecuteNonQuery();
                var lines = System.IO.File.ReadAllLines("village.txt");
                System.IO.File.WriteAllLines("village.txt", lines.Take(lines.Length - 1).ToArray());
            }
        }
    }
    catch (Exception ex)
    {
        MessageBox.Show(ex.Message, "Error", MessageBoxButtons.OK, MessageBoxIcon.Error);
    }
    m_dbConnection.Close();
}

It works, but it is really slow. I hope you guys can help me to improve the performance. Best Regards!

KJenkins
  • 9
  • 4
  • 2
    What have you benchmarked, which part specifically is slow? Use `string.Split()`, use bulk inserts (http://stackoverflow.com/questions/1711631/improve-insert-per-second-performance-of-sqlite). What have you tried? – CodeCaster Mar 17 '15 at 11:26
  • 1
    Have you considered just looking for a CSV library you can reuse? – Sandy Chapman Mar 17 '15 at 11:37

1 Answers1

2

An immediate and marked performance improvement can be made by using compiled regex instead of creating a new instance for every line. So in the class that the method is in:

public class ClassThatThisMethodIsIn 
{
    private static Regex regex = new Regex(",", RegexOptions.Compiled);

    // rest of code goes here
}

And in the method remove the line:

Regex regex = new Regex(",");

If splitting on comma is the only reason you're using Regex though, delete the Regex entirely and instead use string.Split:

var substrings = line.Split(',');

Further performance may be gained by batching the inserts in an INSERT ... SELECT UNION statement and inserting multiple rows at a time, instead of one insert for every line.

Steve Lillis
  • 3,263
  • 5
  • 22
  • 41