1

was wondering if someone could give me a hand im trying to build a program that reads in a big data block of floats with unknown size from a csv file. I already wrote this in MATLAB but want to compile and distribute this so moving to c++.

Im just learning and trying to read in this to start

7,5,1989
2,4,2312

from a text file.

code so far.

// Read in CSV
//
// Alex Byasse

#include <iostream>
#include <fstream>
#include <vector>
#include <string>
#include <sstream>
#include <stdlib.h>

int main() {

    unsigned int number_of_lines = 0;
    FILE *infile = fopen("textread.csv", "r");
    int ch;
    int c = 0;
    bool tmp = true;
    while (EOF != (ch=getc(infile))){
      if(',' == ch){
    ++c;
      }
      if ('\n' == ch){
    if (tmp){
      int X = c;
      tmp = false;
    }
            ++number_of_lines;
    }
    }
    fclose(infile);

  std::ifstream file( "textread.csv" );

  if(!file){
    std:cerr << "Failed to open File\n";
    return 1;
  }

  const int ROWS = X;
  const int COLS = number_of_lines;
  const int BUFFSIZE = 100;
  int array[ROWS][COLS];
  char buff[BUFFSIZE];
  std::string line; 
  int col = 0;
  int row = 0;
  while( std::getline( file, line ) )
  {
    std::istringstream iss( line );
    std::string result;
    while( std::getline( iss, result, ',' ) )
      {
        array[row][col] = atoi( result.c_str() );
        std::cout << result << std::endl;
        std::cout << "column " << col << std::endl;
        std::cout << "row " << row << std::endl;
        col = col+1;
    if (col == COLS){
    std:cerr << "Went over number of columns " << COLS;
    }
      }
    row = row+1;
    if (row == ROWS){
      std::cerr << "Went over length of ROWS " << ROWS;
    }
    col = 0;
  }
  return 0;
}

My matlab code i use is >>

fid = fopen(twoDM,'r');

s = textscan(fid,'%s','Delimiter','\n');
s = s{1};
s_e3t = s(strncmp('E3T',s,3));
s_e4q = s(strncmp('E4Q',s,3));
s_nd = s(strncmp('ND',s,2));

[~,cell_num_t,node1_t,node2_t,node3_t,mat] = strread([s_e3t{:}],'%s %u %u %u %u %u');
node4_t = node1_t;
e3t = [node1_t,node2_t,node3_t,node4_t];
[~,cell_num_q,node1_q,node2_q,node3_q,node_4_q,~] = strread([s_e4q{:}],'%s %u %u %u %u %u %u');
e4q = [node1_q,node2_q,node3_q,node_4_q];
[~,~,node_X,node_Y,~] = strread([s_nd{:}],'%s %u %f %f %f');

cell_id = [cell_num_t;cell_num_q];
[~,i] = sort(cell_id,1,'ascend');

cell_node = [e3t;e4q];
cell_node = cell_node(i,:);

Any help appreciated. Alex

Andrew Barber
  • 39,603
  • 20
  • 94
  • 123
Alex Byasse
  • 322
  • 2
  • 5
  • 16

4 Answers4

7

I would, obviously, just use IOStreams. Reading a homogeneous array or arrays from a CSV file without having to bother with any quoting is fairly trivial:

#include <iostream>
#include <sstream>
#include <string>
#include <vector>

std::istream& comma(std::istream& in)
{
    if ((in >> std::ws).peek() != std::char_traits<char>::to_int_type(',')) {
        in.setstate(std::ios_base::failbit);
    }
    return in.ignore();
}

int main()
{
    std::vector<std::vector<double>> values;
    std::istringstream in;
    for (std::string line; std::getline(std::cin, line); )
    {
        in.clear();
        in.str(line);
        std::vector<double> tmp;
        for (double value; in >> value; in >> comma) {
            tmp.push_back(value);
        }
        values.push_back(tmp);
    }

    for (auto const& vec: values) {
        for (auto val: vec) {
            std::cout << val << ", ";
        }
        std::cout << "\n";
    }
}

Given the simple structure of the file, the logic can actually be simplified: Instead of reading the values individually, each line can be viewed as a sequence of values if the separators are read automatically. Since a comma won't be read automatically, the commas are replaced by spaced before creating the string stream for the internal lines. The corresponding code becomes

#include <algorithm>
#include <fstream>
#include <iostream>
#include <iterator>
#include <sstream>
#include <string>
#include <vector>

int main()
{
    std::vector<std::vector<double> > values;
    std::ifstream fin("textread.csv");
    for (std::string line; std::getline(fin, line); )
    {
        std::replace(line.begin(), line.end(), ',', ' ');
        std::istringstream in(line);
        values.push_back(
            std::vector<double>(std::istream_iterator<double>(in),
                                std::istream_iterator<double>()));
    }

    for (std::vector<std::vector<double> >::const_iterator
             it(values.begin()), end(values.end()); it != end; ++it) {
        std::copy(it->begin(), it->end(),
                  std::ostream_iterator<double>(std::cout, ", "));
        std::cout << "\n";
    }
}

Here is what happens:

  1. The destination values is defined as a vector of vectors of double. There isn't anything guaranteeing that the different rows are the same size but this is trivial to check once the file is read.
  2. An std::ifstream is defined and initialized with the file. It may be worth checking the file after construction to see if it could be opened for reading (if (!fin) { std::cout << "failed to open...\n";).
  3. The file is processed one line at a time. The lines are simply read using std::getline() to read them into a std::string. When std::getline() fails it couldn't read another line and the conversion ends.
  4. Once the line is read, all commas are replaced by spaces.
  5. From the thus modified line a string stream for reading the line is constructed. The original code reused a std::istringstream which was declared outside the loop to save the cost of constructing the stream all the time. Since the stream goes bad when the lines is completed, it first needed to be in.clear()ed before its content was set with in.str(line).
  6. The individual values are iterated using an std::istream_iterator<double> which just read a value from the stream it is constructed with. The iterator given in is the start of the sequence and the default constructed iterator is the end of the sequence.
  7. The sequence of values produced by the iterators is used to immediately construct a temporary std::vector<double> representing a row.
  8. The temporary vector is pushed to the end of the target array.

Everything after that is simply printing the content of the produced matrix using C++11 features (range-based for and variables with automatically deduced type).

Dietmar Kühl
  • 150,225
  • 13
  • 225
  • 380
  • This looks pretty good but I want to learn along the way, and I don't understand alot of what ur doing in this code. Could you please put in some comments. – Alex Byasse Sep 16 '13 at 00:32
  • @AlexByasse: The only unusual bit is `comma()` which is simply a _manipulator_ (this should allow finding documentation what this does). With that, you can just look at the definition of the various components used. Yes, I could write a lengthy essay describing what is being done but you can as well research the various bits. – Dietmar Kühl Sep 16 '13 at 00:36
  • @AlexByasse: I don't. I'm clearly reading from `std::cin`. If you want to have it read from a file you'll need to replace that with a suitably initialized `std::ifstream`. – Dietmar Kühl Sep 16 '13 at 00:42
  • Thanks for your answer, but dont be so harsh Im only learning and transitioning from MATLAB which is much more simpler, Im an engineer not a computer scientist.. THanks anyway for a good answer Ive made urs the correct answer even tho I dont fully understand it. Could you at least tell me whats wrong generally with my approach?? – Alex Byasse Sep 16 '13 at 01:03
  • @AlexByasse: 1. You read the file twice using both stdio and IOStreams, 2. several variables are not correctly scoped, 3. C++ doesn't have variable length arrays but uses `std::vector` instead, 4. it seems wasteful to extract strings which are then parsed instead of parsing values directly, 5. the result of reading the values is unchecked, 6. `atoi()` is for integers not for floats, 7. the arrays are declared to use `int` as well although your description states floats, 8. reading individual characters is slow, 9. `std:cerr` is the label `std` followed by a name (you wanted `std::cerr`). – Dietmar Kühl Sep 16 '13 at 01:27
  • @AlexByasse: I have added a simpler version of reading the values - it didn't occur to me earlier. There are some comments on what it does below the code. – Dietmar Kühl Sep 16 '13 at 01:58
  • @AlexByasse, you can look up the functions being used here: http://en.cppreference.com/w/ – Adam Burry Sep 16 '13 at 02:50
  • @DietmarKühl: Thanks I get what ur doing in the 2nd method. And it's really helping in getting me on my way to replacing matlab with c++ which I think is better for the work I'm doing (the non-graphical component of it anyway, btw Im a coastal engineer and am building tools for setting up hydrodynamic models and visualizing results). Thanks for the time and effort, I really appreciate it. – Alex Byasse Sep 16 '13 at 04:04
  • @AlexByasse, see my benchmarking post for these solutions below. – Adam Burry Sep 16 '13 at 04:20
  • When compiling method 2 I get errors: read_csv.cpp: In function 'int main()': read_csv.cpp:15:35: error: '>>' should be '> >' within a nested template argument list read_csv.cpp:26:25: error: expected initializer before ':' token read_csv.cpp:32:1: error: expected primary-expression at end of input read_csv.cpp:32:1: error: expected ';' at end of input read_csv.cpp:32:1: error: expected primary-expression at end of input read_csv.cpp:32:1: error: expected ')' at end of input read_csv.cpp:32:1: error: expected statement at end of input read_csv.cpp:32:1: error: expected '}' at end of input – Alex Byasse Sep 16 '13 at 04:21
  • @AlexByasse: The code as copied certainly compiles for me. The line numbers in your error don't line up with the code when copied&pasted, i.e., you did something to the sources. That said, the first error seems to indicate you don't use C++11. I have changed the code also compile with C++03. – Dietmar Kühl Sep 16 '13 at 05:58
1

As proposed here changing getline escape may help you with better reading of csv file but you need to change type from string to int.

For dealing with any number of rows and cols you may use multi dimensional vector (vector inside vector as described here), then you have each row in one vector and all rows in the bigger vectors

Community
  • 1
  • 1
Polla A. Fattah
  • 801
  • 2
  • 11
  • 32
0
int fclose(infile);

This line is wrong. The compiler thinks you're trying to initialize the variable fclose with a FILE*, which is wrong. It should be this if you're simply trying to close the file:

fclose(infile);
David G
  • 94,763
  • 41
  • 167
  • 253
0

I intended this as an edit to Dietmar Kuhl's solution, but it was rejected as too large an edit...

The usual reason given for converting Matlab to C++ is performance. So I benchmarked these two solutions. I compiled with G++ 4.7.3 for cygwin with the following options "-Wall -Wextra -std=c++0x -O3 -fwhole-program". I tested on a 32-bit Intel Atom N550.

As input I used 2 10,000 line files. The first file was 10 "0.0" values per line, the second file was 100 "0.0" values per line.

I timed from the command line using time and I used the average of the sum of user+sys over three runs.

I modified the second program to read from std::cin as in the first program.

Finally, I ran the tests again with std::cin.sync_with_stdio(false);

Results (time in seconds):

               sync                no sync
        10/line  100/line     10/line  100/line
prog A    1.839    16.873       0.721     6.228
prog B    1.741    16.098       0.721     5.563

The obvious conclusion is that version B is slightly faster, but more importantly, you should disable syncing with stdio.

Adam Burry
  • 1,904
  • 13
  • 20
  • Could you send me the 2nd Method .cpp you could compile by email to alex.byasse@bmtwbm.com.au or did you just copy paste the second method?? And how do you turn off syncing?? – Alex Byasse Sep 16 '13 at 04:27
  • Syncing is only relevant if you read from a standard stream such as std::cin. If you read from an ifstream, as it appears you intend, then you may ignore it. In other words, Dietmar's second solution is tailor made for you. – Adam Burry Sep 16 '13 at 04:50
  • Im getting a compilation error when I copy his solution. I posted it at end of original question. Could you have a look and tell me what Im doing wrong please?? – Alex Byasse Sep 16 '13 at 05:32