-3

I would like to load a comma seperaetd txt file into a map data structure from C++ on visual studio 2013 on win7.

Currently, the txt file has 5000 lines and 300 KB.
Each line is delimited by a new line. I use getline() and It cost me 90 seconds to finish loading the whole file.

The file is like:

      id , value1, value2,  value3 , … // about 50+ columns
      abc,36.1,69.15,96.358 , ….
      pwr, ….

I need the final format in the map > data structure like: (id is an index in the map and column name is another index.)

    abc     value1    36.1
            value2    69.15
            value3    96.358
               … 
    pwr       …  

My C++ code:

while (getline(file, aLine))
{
     **UPDATE**
    // split the line by comma
    stringstream ssa(aLine);
    vector<string> line;
    while (ssa.good())
    {
        string asubStr;
        getline(ssa, asubStr, ',');
        line.push_back(asubStr);
    }
    // cast each string to double if needed.
     myMap[id][valueX]  = y ; // X can be 1, 2, 3, … 50, 
                             // y is the number of a value column in 
                             // the file, 
                             //myMap is map <string, map<string, double >> 
}

My final file size can be 60MB and 1 million lines.

Is it possible to save all data in a map in C++ ? And how to load the file very fast into C++ ? 90 secs for 5000 lines is too slow.

In C++, fgets() do not work for me because I do not know number of elements in the file.

I would like to load the file as fast as possible and then process each line in a data structure.

Thanks

More UPDATE I made a change so that only load each line as a string without doing any split.

set<string> mySet;
while (getline(file, aLine))
{
    mySet.insert(aLine); // this is all what I do in the loop.
}

But, it still took 12 secs for 5000 lines. So, for 1 milion lines, it will take 40 minutes !

user3448011
  • 1,469
  • 1
  • 17
  • 39
  • Possible duplicate: [How can I read and parse CSV files in C++?](http://stackoverflow.com/questions/1120140/how-can-i-read-and-parse-csv-files-in-c) – NathanOliver Feb 26 '16 at 21:05
  • 2
    300kB in 90 seconds is 3333.[3] Bps. That's the speed of modems in the 90s of the previous century. You are doing something terribly wrong. – nsilent22 Feb 26 '16 at 21:06
  • Have you compiled in release mode ? – Christophe Feb 26 '16 at 21:10
  • If `while (getline(file, aLine))` takes 90 seconds for 300 Kb file, you have a faulty hard disk not a faulty program. – Captain Giraffe Feb 26 '16 at 21:10
  • 2
    Step 1: time just the while/getline, without the processing of the line. My guess is that that will be fast. If so, it's the processing that has issues - and you're not showing that. Always post _complete_ code that shows the problem. – Zastai Feb 26 '16 at 21:14
  • Try creating a RAMdisk and putting your program and text file in it, then run the program. Is the performance any different? The only way I can imagine things being this slow is if you're doing something like reading off of an SDcard in SPI mode rather than 4-wire mode on an Arduino. – Cloud Feb 26 '16 at 21:15
  • Please post the declaration of `myMap`? – Ahmed Akhtar Feb 26 '16 at 21:15
  • please see my UPDATE. – user3448011 Feb 26 '16 at 21:31
  • @Christophe, when I switched to "release" mode, i got compile error telling me that a header file cannot be found, If I added it to "stdafx.h", I got the same error in "stdafx.h", – user3448011 Feb 26 '16 at 22:01
  • What speed do you achieve if you remove the map operations? – zdf Feb 26 '16 at 22:02
  • 1
    If you run in debug mode, then the speed you're achieving might be normal. – zdf Feb 26 '16 at 22:08

1 Answers1

1

Some operating systems provide a feature called memory mapping, where the OS treats a file as memory. The OS handles reading the data into memory.

You may want to consider using block reading. Read a block of data into memory and search memory.

The idea here is to optimize the data transfer between the file and memory. Reading one line at a time is not as efficient as reading blocks of 10k or more.

Another technique is to use multiple threads. Let one thread read data from a file into a buffer. Another thread processes the data. A possible third thread outputs the results.

A simple trick is to preallocate the length of a string to some percentage of the longest line. Don't keep reallocating, or declaring new strings, but reusing this large one. There is some execution penalty for strings resizing.

Thomas Matthews
  • 56,849
  • 17
  • 98
  • 154