3

I have a program that is supposed to retrieve (on start-up) data from a text file. This file may get huge and I was wondering how I could speed up the process and assess its current performance. The code used to retrieve data is as follow:

void startUpBillsLoading(Bill *Bills)
{
    FILE *BillsDb = 0, *WorkersDb = 0, *PaymentDb = 0;
    BillsDb = fopen("data/bills.db", "r");
    WorkersDb = fopen("data/workers.db", "r");
    PaymentDb = fopen ("data/payments.db", "r");
    char *Buffer = malloc (512);

    if (BillsDb && WorkersDb && PaymentsDb)
    {
        int i = 0, j = 0;

        while (fscanf (BillsDb, "%d;%[^;];%[^;];%[^;];%[^;];%d/%d/%d;%d/%d/%d;%d;%f;%f\n",
                &Bills[i].Id,
                Bills[i].CompanyName,
                Bills[i].ClientName,
                Bills[i].DepartureAddress,
                Bills[i].ShippingAddress,
                &Bills[i].Creation.Day,
                &Bills[i].Creation.Month,
                &Bills[i].Creation.Year,
                &Bills[i].Payment.Day,
                &Bills[i].Payment.Month,
                &Bills[i].Payment.Year,
                &Bills[i].NumWorkers,
                &Bills[i].TotalHT,
                &Bills[i].Charges) == 14)
        {
            Bills[i].Workers = 
                malloc (sizeof(Employee)*Bills[i].NumWorkers);

            fscanf (PaymentDb, "%d;%d;%[^;];%[^;];%[^\n]\n",
                    &Bills[i].Id,
                    &Bills[i].PaymentDetails.Method,
                    Bills[i].PaymentDetails.CheckNumber,
                    Bills[i].PaymentDetails.VirementNumber,
                    Bills[i].PaymentDetails.BankName);

            LatestBillId++;
            i++;
        }

        i = 0;
        while (fscanf (WorkersDb, "%d;%[^;];%[^;];%f\n",
                    &Bills[i].Id,   
                    Bills[i].Workers[j].Surname,
                    Bills[i].Workers[j].Name,
                    &Bills[i].Workers[j].Salary) == 4)
        {
            for (int j = 1; j <= Bills[i].NumWorkers-1; j++)
            {
                fscanf (WorkersDb, "%d;%[^;];%[^;];%f\n",
                                &Bills[i].Id,   
                                Bills[i].Workers[j].Surname,
                                Bills[i].Workers[j].Name,
                                &Bills[i].Workers[j].Salary);
            }
            i++;
        }

        fclose(BillsDb);
        fclose(WorkersDb);
        fclose(PaymentDb);
    }
    else
        printf ("\t\t\tImpossible d'acceder aux factures !\n");

    free (Buffer);
}

I have used the time.h library to measure the time it takes to retrieve all the required data. A Bill's data is separated in 3 files: bills.db, workers.db and payments.db. Each file line from the bills.db and from payments.db represents an entire bill whereas in workers.db the amount of lines required to represent a bill is variable and depends on the numbers of employees that are related to the bill.

I created these 3 files in this way:

  • bills.db and payments.db has 118087 lines (thus as many bills)
  • Each bill was set (arbitrarily) to have 4 workers therefore, the workers.db file has 118087*4 = 472348 lines.

The time taken by this function to run completely is around 0.9 seconds. How good (or bad) is this time and how to improve it ?

Jenkinx
  • 103
  • 5
  • 1
    Just a side note, reading files sequentially is faster. Perhaps this will be useful https://stackoverflow.com/questions/42620323/why-is-reading-multiple-files-at-the-same-time-slower-than-reading-sequentially – Tony Tannous Aug 07 '17 at 20:42
  • For me it's good as it is. If you have to handle larger files later, you might want to load only a part on start up, and to load the rest if really needed (mayne asynchronously?). Using a sql database may be more efficient though. – Antoine C. Aug 07 '17 at 20:43
  • 2
    If performance is important, why not use an *actual* database, such as sqlite3?! – Antti Haapala -- Слава Україні Aug 07 '17 at 20:45
  • @Lovy i just thought of an alternative which would be to have a thread loading the data in the background so the user doesn't have this "unresponsive" time while data is loading, however I'm not quite sure this is really necessary. – Jenkinx Aug 07 '17 at 20:50
  • @AnttiHaapala that's a nice option i didnt think of ! I'll look into that, thanks ! – Jenkinx Aug 07 '17 at 20:52
  • @TonyTannous what does sequentially mean ? – Jenkinx Aug 07 '17 at 20:54
  • Sequentially means One after the other. First read while file 1, then move to the next one. Instead of reading them the same time. – Tony Tannous Aug 07 '17 at 20:55
  • @TonyTannous so I should read ```bills.db``` entirely, then go to the other files ? – Jenkinx Aug 07 '17 at 20:57
  • 1
    It might be faster, you could try and check if it is faster or not. I can not tell for sure as I don't know what your code exactly does. – Tony Tannous Aug 07 '17 at 20:59
  • Why don't you use "normal" database instead reinventing the wheel? Faster easier, decent query engine, safety, consistency, journalling ....... – 0___________ Aug 07 '17 at 20:59
  • I would imagine, the most of the time would be spent in formatting input - converting string to numbers - and **not** the actual IO. – SergeyA Aug 07 '17 at 21:45
  • The issue as I see it is that you're dealing with a text file. If you could reformat your db files to be binary records with fixed sizes, then you could read an entire record in one shot. This would clean up your code and most likely speed up your performance. – DiegoSunDevil Sep 08 '17 at 23:21

1 Answers1

0

There are few thing you must read. First is a asymptotic time complexity and asymptotic space complexity and Second is Big O notation. The Big O notation tell how well out program works. For the the code provided by you, the Big O complexity is O(n^2) aprox. Therefore the max limit is good as it is same as quick sort but since the data you use have much length the loading time will always add to your runtime. If you want to improve try to minimise the length of your data and read the least of it from the file. since if the the value of n increase the time will increase rapidly. you can read about asymptotic notation and Big O notation from here