-1

So I have a .txt file with 30million lines for which I hash each with MD5 and SHA1 and insert it into my MySQL database. It works completely fine, but it takes 1 second for each hash. That's 30 million seconds...

The problem: Its not fast enough, 1 per 1 second is too slow.

What I want: Some kind of threading that can speed it up so it does like 20 in 1 second! (maybe that's too unrealistic)

Here's my code, feel free to manipulate: (I'm using MySql.Data as a reference)

        bool success;
        int done = 0;
        string command;
        WebClient putHash = new WebClient();

        string server = "my_server";
        string database = "my_db";
        string uid = "my_dbuser";
        string password = "my_pass";
        string connectionstring = "SERVER=" + server + ";DATABASE=" + database + ";UID=" + uid + ";PASSWORD=" + password + ";";

        MySqlConnection cnn = new MySqlConnection(connectionstring);
        MySqlCommand cmd = new MySqlCommand();

        foreach (string line in File.ReadLines(@"directory with .txt file which has 30million lines"))
        {
                string linefixed = line.Replace(" ", "").Replace("'", "").Replace(";", "").Replace(")", "").Replace("\\", "").Replace("=", "");
                success = false;
                byte[] hashedv = new UTF8Encoding().GetBytes(linefixed);
                byte[] hash = ((HashAlgorithm)CryptoConfig.CreateFromName("MD5")).ComputeHash(hashedv);
                string encodedinput = BitConverter.ToString(hash).Replace("-", string.Empty).ToLower();

                byte[] bytes = Encoding.UTF8.GetBytes(linefixed);

                var sha1 = SHA1.Create();
                byte[] hashBytes = sha1.ComputeHash(bytes);

                byte[] hexed = hashBytes;

                var sb = new StringBuilder();
                foreach (byte b in hexed)
                {
                    var hex = b.ToString("x2");
                    sb.Append(hex);
                }
                string sha1done = sb.ToString();

                while (!success)
                {
                    try
                    {
                        command = "INSERT INTO passwdhashes (password,MD5,SHA1) VALUES('" + linefixed + "','" + encodedinput + "','" + sha1done + "')" ;
                        cmd.CommandText = command;
                        cmd.Connection = cnn;
                        cnn.Open();
                        cmd.ExecuteNonQuery();
                        cnn.Close();
                        success = true;
                    }
                    catch (Exception ex)
                    {
                        Console.WriteLine(ex.Message);
                        cnn.Close();
                    }
                }
            done = done + 1;
            Console.WriteLine("\n" + done);
        }
        Console.ReadKey();
    }
}

}

Raghav J.
  • 53
  • 8
  • Some improvements: Create one instance of MD5 and SHA1 hasher outside the loop. Open/close the DB connection once, outside the loop. Collect multiple rows (f.e. 1000) and bulk/batch insert them. – huysentruitw Oct 27 '15 at 12:12

1 Answers1

0

You can insert multiple rows at a time: https://dev.mysql.com/doc/refman/5.5/en/insert.html

Something Like this: INSERT INTO passwordHashed (password,MD5,SHA1) VALUES(1,2,3),(4,5,6),(7,8,9);

This is sample / pseudo code and might not compile:

List<ElementType> elementsToInsert;

foreach (string line in File.ReadLines(@"file")) {

    elementsToInsert.Add(element);
    if (elementsToInsert.size == 100) {
        // execute SQL insertion
        elementsToInsert.clear();
    }
}

ElementType should be a class with the fields you use for insertion.

Miguel Mesquita Alfaiate
  • 2,851
  • 5
  • 30
  • 56
  • But then I'd have to do several foreaches to replace 4,5,6 and 7,8,9 and make sure I don't hash the same line again by mistake... or is there a way you know I can implement it in C#? – Raghav J. Oct 27 '15 at 12:12
  • @RaghavJ. you create an array with the values to insert, and execute the insert at every N elements. Note: The 1,2,3,4,5,6,7,8,9 are just sample values. – Miguel Mesquita Alfaiate Oct 27 '15 at 12:14
  • you don't need multiple foreaches, just keep the values to insert on a list and do a bulk/batch insert when the list reaches 1000 rows f.e. Just don't forget to insert the remainder once the foreach is done :) – huysentruitw Oct 27 '15 at 12:15
  • Can you please manipulate my code and show me what you mean? I'm having trouble understanding... – Raghav J. Oct 27 '15 at 12:20
  • The "duplicate" does not work, I tested by putting the entire thing in a for loop. Still takes 1 second per entry. – Raghav J. Oct 27 '15 at 12:21
  • @RaghavJ. added sample / pseudo code for the mechanism – Miguel Mesquita Alfaiate Oct 27 '15 at 12:23
  • @BlunT I see what you mean! I'll add this ASAP and get back to you, thanks! – Raghav J. Oct 27 '15 at 13:07