0

I have 1000 csv files, each csv file contains 5500 rows of data and 8 columns. My code below took 5 minutes to import 1 csv file into MySQL database. I know there are many questions like this out there, I've been looking into them. But, is there any suggestions you have to improve this import process, according to my code?

private void btn_fetchSharePrices_Click(object sender, EventArgs e)
{
    string[] fileCSV = Directory.GetFiles(sourceDirCSV);
    foreach (string csv in fileCSV)
    {
        try
        {
            string[] lines = File.ReadAllLines(csv);
            foreach (var line in lines)
            {
                var data = line.Split(new[] { ',' }, 8);
                DateTime prices_date = DateTime.Parse(data[0].Trim());
                DateTime prices_time = DateTime.Parse(data[1].Trim());
                string open = data[2].Trim();
                string high = data[3].Trim();
                string low = data[4].Trim();
                string close = data[5].Trim();
                int volume = int.Parse(data[6].Trim());
                int tickers_ticker_id = int.Parse(data[7].Trim());
                StoreRecord_FetchSharePrices(prices_date, prices_time, open, high, low, close, volume, tickers_ticker_id);
            }
        }
        catch (Exception ex)
        {
            MessageBox.Show(ex.Message);
        }  
    }
}

private void StoreRecord_FetchSharePrices(DateTime prices_date, DateTime prices_time, string open, string high, string low, string close, int volume, int tickers_ticker_id)
{
    using (var connection = new MySqlConnection(strProvider))
    using (var command = connection.CreateCommand())
    {
        connection.Open();
        command.CommandText = @"INSERT IGNORE INTO prices (Prices_Date, Prices_Time, Prices_Open, Prices_High, Prices_Low, Prices_Close, Prices_Volume, Tickers_Ticker_ID) VALUES (@Prices_Date, @Prices_Time, @Prices_Open, @Prices_High, @Prices_Low, @Prices_Close, @Prices_Volume, @Tickers_Ticker_ID)";
        command.Parameters.AddWithValue("@Prices_Date", prices_date);
        command.Parameters.AddWithValue("@Prices_Time", prices_time);
        command.Parameters.AddWithValue("@Prices_Open", open);
        command.Parameters.AddWithValue("@Prices_High", high);
        command.Parameters.AddWithValue("@Prices_Low", low);
        command.Parameters.AddWithValue("@Prices_Close", close);
        command.Parameters.AddWithValue("@Prices_Volume", volume);
        command.Parameters.AddWithValue("@Tickers_Ticker_ID", tickers_ticker_id);
        command.ExecuteNonQuery();
    }
}
merv
  • 331
  • 5
  • 25
  • 3
    If you are connecting to the same database for every file, you need to open the connection once , create a prepared statement once,and use that prepared statement. You are currently opening the connection and a new command for every line in the spreadsheets. That is going to kill performance. – OldProgrammer Jun 16 '14 at 18:09
  • Should be posted on http://codereview.stackexchange.com/ as the post is about reviewing/improving correctly working code. – Alexei Levenkov Jun 16 '14 at 18:12

4 Answers4

4

Quicker? Do it in MySQL

load data local infile 'file.csv' into table table_name
 fields terminated by ','
 enclosed by '"'
 lines terminated by '\n'
 (column1, column2, column3,...)

Run a foreach loop over files in a folder Directory.EnumerateFiles(Folder_path) and run above command for each file with full file path (in place of file.csv)

Typist
  • 1,464
  • 9
  • 14
2

Use Directory.EnumerateFiles() rather than Directory.GetFiles()

See documentation:

The EnumerateFiles and GetFiles methods differ as follows: When you use EnumerateFiles, you can start enumerating the collection of names before the whole collection is returned; when you use GetFiles, you must wait for the whole array of names to be returned before you can access the array. Therefore, when you are working with many files and directories, EnumerateFiles can be more efficient.

Matthijs
  • 3,162
  • 4
  • 25
  • 46
2

You can try to use LOAD DATA INFILE command, see link.

I think it is the best option.

You can write a small program to launch this command for each CSV file.

Community
  • 1
  • 1
Ivan Rodriguez
  • 426
  • 4
  • 14
-1

You can use Bulk Insert command:

BULK
INSERT prices 
FROM 'your file name'
WITH
(
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n'
)
Volma
  • 1,305
  • 9
  • 17