Reading txt files with various dimensions as input for k-means algorithm program

Question

I'm trying to develop a C++ k-means program that will be reading text files. The problem is that the text files are not uniform.

For example, data1.txt looks like

and data3.txt looks like

I'm thinking I should store the data in a vector somehow. What's the best way to develop this without forcing the user to tell the program how many dimensions each entry in the text file has? I want the user to call the program like

program data_.txt #

Where data_.txt = any text file and # = the number of clusters

In the three-column input case, do you want to treat all the numbers as one input array, or are they meant to be kept as three independent arrays and then you get three answers? — John Zwinck, Oct 12 '14 at 01:35
@JohnZwinck in data3.txt, each line is one array with 3 entries — GeorgeCostanza, Oct 12 '14 at 01:58
@GeorgeCostanza: maybe it will be more clear if you tell us what you want the output to look like for each input case. — John Zwinck, Oct 12 '14 at 02:48

score 1 · Accepted Answer · edited May 23 '17 at 12:05

1

You can use boost::split() function (or this method) in order to get the number of entries in every line. Then, if you now how many entries there are, you also know how many dimensions you're dealing with. This also allows you to implement simple input validation (constant number of columns in a file).

There is also good method with stringstream, which allows to split by any whitespace. You could use the following:

#include <sstream>
#include <vector>
#include <iterator>

template<typename T>
std::vector<T> split(const std::string& line) {
  std::istringstream is(line);
  return std::vector<T>(std::istream_iterator<T>(is), std::istream_iterator<T>());
}

Usage:

std::string line = "1.2 3.4 5.6e7";
std::vector<double> vec = split<double>(line);

edited May 23 '17 at 12:05

Community

1
1

answered Oct 12 '14 at 02:09

lisu

2,213
14
22

I tried to use boost::split, but again, the files aren't uniform. some text files have lines with numbers separated by a space and others have lines with numbers separated by a tab – GeorgeCostanza Oct 12 '14 at 03:03
I added another method – lisu Oct 12 '14 at 03:08
thanks. the data files are supposed to be uniform so the boost method will work. – GeorgeCostanza Oct 12 '14 at 03:33

Reading txt files with various dimensions as input for k-means algorithm program

1 Answers1