1

I'm trying to develop a C++ k-means program that will be reading text files. The problem is that the text files are not uniform.

For example, data1.txt looks like

0.1  
3.0  
0.7  
0.5  
0.2  
1.5  
1.6  

and data3.txt looks like

33 37 53
35 36 52
34 37 53
35 37 51
34 38 52
33 38 51
33 39 52
33 37 52
34 37 52
34 39 52

I'm thinking I should store the data in a vector somehow. What's the best way to develop this without forcing the user to tell the program how many dimensions each entry in the text file has? I want the user to call the program like

program data_.txt #

Where data_.txt = any text file and # = the number of clusters

GeorgeCostanza
  • 395
  • 1
  • 6
  • 19

1 Answers1

1

You can use boost::split() function (or this method) in order to get the number of entries in every line. Then, if you now how many entries there are, you also know how many dimensions you're dealing with. This also allows you to implement simple input validation (constant number of columns in a file).

There is also good method with stringstream, which allows to split by any whitespace. You could use the following:

#include <sstream>
#include <vector>
#include <iterator>

template<typename T>
std::vector<T> split(const std::string& line) {
  std::istringstream is(line);
  return std::vector<T>(std::istream_iterator<T>(is), std::istream_iterator<T>());
}

Usage:

std::string line = "1.2 3.4 5.6e7";
std::vector<double> vec = split<double>(line);
Community
  • 1
  • 1
lisu
  • 2,213
  • 14
  • 22